In this post I will present you the Introduction to Pandas and how to setup Pandas Jupyter Notebook" to start working with and learning Pandas. This is a part of the Pandas Tutorial" on our Website!
Table of Contents
Introduction to Pandas
Pandas
Pandas is a Python open-source toolkit commonly used for data manipulation and analysis. It has strong data structures and data manipulation capabilities that make working with massive datasets simple. In this post, we will cover the fundamentals of Pandas, such as its data structures, data input/output, and data manipulation capabilities.
There are several benefits of learning Pandas:
- Ease of Use: Pandas is a user-friendly library that makes working with data simple, especially for individuals with no programming" expertise. It is a popular option among data scientists and analysts due to its simple syntax and robust data manipulation features.
- Data Cleaning and Preparation: Pandas has several tools for cleaning and preparing data for analysis. It lets you deal with missing and duplicate data, conduct simple data transformations, and add new columns or rows.
- Data Manipulation: Pandas has a number of data manipulation operations, such as sorting, filtering, and aggregating. This makes it simple to manipulate enormous datasets and get valuable insights from them.
- Data Visualization: Pandas works nicely with well-known data visualisation frameworks like Matplotlib and Seaborn. This enables you to construct a variety of visualisations, including line plots, scatter plots, bar plots, and more.
- Handling Different Data Types: Pandas can handle a variety of data kinds, including text, numerical, and categorical data, making it an ideal library for dealing with a wide range of data types.
- Interoperability: Pandas works well with other libraries like NumPy, Scikit-learn, and TensorFlow, making it a flexible toolkit that can be used for a variety of data science jobs..
- High-performance: Pandas is based on NumPy, a high-performance array-processing package. This enables it to efficiently manage massive datasets and execute sophisticated computations.
- Widely used: Pandas is widely utilised in the data science field, making it an important ability to know. It’s widely employed in areas including banking, marketing, and healthcare, as well as in academic research.
Setup Pandas Jupyter Notebook
In this Introduction to Pandas Tutorial" I recommend you to learn Pandas using Jupyter Notebook. The easiest way to setup the Jupyter Notebook is to use Official Docker" Image with already pre-installed Pandas and all DataScience stuff.
Jupyter Notebook is a web-based interactive development environment (IDE) that lets you create and share documents with live code, equations, visualisations, and narrative prose. It is generally used for data science and scientific computing, but it may also be used for data cleansing, machine learning", and other activities.
Jupyter Notebook is built on the IPython kernel and supports a variety of programming languages, including Python", R, Julia, and others. Each notebook is made up of cells that can contain code, Markdown text, or raw text. The code in a cell can be run by clicking the “Run” button or by pressing Shift + Enter on the keyboard.
Jupyter Notebook additionally has support for interactive widgets, inline graphs, and LaTeX equations, making it well-suited for data research and scientific computing. It also has version control functionality, making it simple to communicate with others and keep track of changes.
Setup Pandas in Jupyter Notebook
Due to fact that we want to have all our work persist on local computer please create the new workspace directory on your computer. I my case it’s Pandas
:
mkdir Pandas cd Pandas
Getting Started With Pandas In Jupyter Notebook
To run the Jupyter Notebook" please use the following command:
docker run -it --rm -p 8888:8888 -v "${PWD}":/home/jovyan/work jupyter/datascience-notebook:latest
As you can see in Docker" run command is added volume -v which points to your current directory where you are and execute Docker" run command. I that case all your notebook will be visible in Docker" container and persist on your local machine as well.
After you executed this command you will see two links to open" in your browser. You can choose one of them. I prefere the second one:
To access the server, open this file in a browser: file:///home/jovyan/.local/share/jupyter/runtime/jpserver-8-open.html Or copy and paste one of these URLs: http://2788586c6521:8888/lab?token=2e8e028f331c58055b0c80adac09ff3c6a92cdc4b963eb6a or http://127.0.0.1:8888/lab?token=2e8e028f331c58055b0c80adac09ff3c6a92cdc4b963eb6a
Then you should see:

And on the left side please open" the work
directory. Now we are ready to move on! Now you can learn Pandas DataFrame" and Series through Jupyter Notebook!
Pandas Series And DataFrame
Now I will try to help you with understanding Pandas Series and DataFrame".
Pandas Series
A Pandas Series is a one-dimensional array-like object that may store any form of data. It is analogous to a spreadsheet column or a database" table field. Each element in a Series has a distinct name, known as an index, that may be used to retrieve and edit the data.
A Series can be created by passing a list of values to the pd.Series()
constructor. For example:
import pandas as pd s = pd.Series([1, 3, 5, 4, 6, 8]) print(s)
The resulting Series will have a default" index that is a range of integers starting from 0. However, you can also specify your own index when creating a Series. For example:
s = pd.Series([1, 3, 5, 4 , 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f']) print(s)
A Series can also be created from a NumPy array, a Python" dictionary, or a scalar value:
import numpy as np s = pd.Series(np.random.randn(5)) s = pd.Series({'a':1, 'b':2}) print(s)
Pandas DataFrame
A Pandas DataFrame" is a two-dimensional table comprising rows and columns of data. It’s comparable to a spreadsheet or a SQL" table. Each column in a DataFrame" is a Series object with its own label, known as a column" name. Each entry in a DataFrame" has its own label, known as an index.
A DataFrame" may be built from a number of data sources, including NumPy arrays, Python" dictionaries, and lists of lists. For example, you can construct a DataFrame" from a Python" dictionary as follows:
data = {'name': ['Tom', 'Paul', 'George'], 'age': [25, 30, 35], 'city': ['New York', 'Los Angeles', 'Chicago']} df = pd.DataFrame(data) print(df)
A DataFrame" may also be generated by supplying the column" names and row index in a NumPy array:
import numpy as np data = np.random.randn(5, 4) df = pd.DataFrame(data, columns=['a', 'b', 'c', 'd'], index=['first', 'second', 'third', 'fourth', 'fifth']) print(df)
a b c d first -1.198920 -0.052465 0.963785 -0.358522 second -0.330139 2.460610 -0.680101 -0.880916 third -0.865693 0.702937 -0.033988 -1.580205 fourth 1.454724 0.074386 -0.327213 -2.397623 fifth 0.063583 0.009302 0.817274 1.570743
Once a DataFrame" is created, you can perform various operations on it such as:
- selecting, adding, and deleting columns
- filtering rows based on certain conditions
- sorting the data
- aggregating and summarizing the data
- merging and joining with other DataFrames
- and more.
But we will learn it the next parts of this Pandas Turorial.
Summary
That’s all about introduction to Pandas!
Pandas’ primary data structures are Pandas Series and DataFrames. A Series is a one-dimensional array-like object that may store any sort of data and has a unique label, known as an index, that can be used to retrieve and edit the data.
A DataFrame", like a spreadsheet or SQL" table, is a two-dimensional data table having rows and columns. Each column in a DataFrame" is a Series object with its own label, known as a column" name, and each row in a DataFrame" has its own label, known as an index.
DataFrames may be built using a number of data sources, including NumPy arrays, Python" dictionaries, and lists of lists. Once generated, DataFrames may be used to execute a variety of actions such as selecting, adding, and removing columns, filtering rows, sorting, aggregating, and summarising, and merging and combining with other DataFrames.
In this Tutorial" you have also learned how to setup Pandas in Jupyter Notebook.
Could You Please Share This Post?
I appreciate It And Thank YOU! :)
Have A Nice Day!