This part will teach you how to alter and change your data = Pandas Data Manipulation. You will learn about various data sorting, filtering, and aggregation procedures, as well as how to execute fundamental mathematical operations on your data.
Table of Contents
Introduction
Data Manipulation
The practise of modifying, reorganising, or cleansing data to make it more helpful for analysis or decision-making is referred to as data manipulation. It entails a wide range of tasks such as data sorting, filtering, aggregation, and transformation.
The objective of data manipulation is to change raw data into a more useable format for the intended audience or purpose. Manual data manipulation is possible, as can utilising specialist software such as Excel" or Python’s Pandas module. It is a critical phase in the data analysis process since it prepares the data for further analysis and visualisation. Data manipulation methods are frequently utilised to extract insights and make educated decisions in industries such as business, finance, and research.
Pandas DataFrame
A DataFrame" is one of the two primary data structures supplied by Python’s pandas package. It is a two-dimensional, size-mutable, heterogeneous tabular data structure containing rows and columns, much like a spreadsheet or SQL table.

A DataFrame" is made up of one or more Series objects, which constitute the DataFrame’s columns. Each Series has a name and an index, and the DataFrame" has its own index for identifying rows. A DataFrame" can also have numerous indices, allowing for more complicated data structures to be created.
A DataFrame" may be created from a variety of data sources, including a CSV" file, an Excel" file, a SQL database", or a Python" dictionary. Once you’ve created a DataFrame", you may use it to do operations such as sorting, filtering, aggregating, and converting data.
DataFrames are commonly used for data analysis and modification. It’s powerful, versatile, and simple to use, with a large number of functions and methods for working with data. Pandas DataFrame" is likewise created to handle enormous datasets and is based on the NumPy library, which enables rapid array operations.
Pandas DataFrames is a strong tool for dealing with and managing enormous datasets; it’s frequently used in industries such as business, finance, and research to extract insights and make educated decisions.
Pandas Data Manipulation
No we will go thought Pandas data manipulation technics. To present the all technics I will use the Pandas DataFrame" about Cars.
import pandas as pd import numpy as np # Create a dictionary of data data = {'Make': ['Ford', 'Toyota', 'Chevrolet', 'Honda', 'Ford', 'Toyota'], 'Model': ['Mustang', 'Corolla', 'Camaro', 'Civic', 'F-150', 'Camry'], 'Year': [2015, 2017, 2019, 2018, 2016, 2020], 'Mileage': [50000, 25000, 10000, 30000, 40000, 20000], 'Price': [20000, 18000, 25000, 22000, 30000, 19000]} # Create the Pandas dataframe df = pd.DataFrame(data) # Print the dataframe print(df)
Output:
Make Model Year Mileage Price 0 Ford Mustang 2015 50000 20000 1 Toyota Corolla 2017 25000 18000 2 Chevrolet Camaro 2019 10000 25000 3 Honda Civic 2018 30000 22000 4 Ford F-150 2016 40000 30000 5 Toyota Camry 2020 20000 19000
Pandas Drop Columns
With the drop()
function and the axis argument set to 1, we may remove one or more columns from the DataFrame". We’ll remove the "Price"
column here:
# Drop the 'Price' column df = df.drop('Price', axis=1) print(df)
Output:
Make Model Year Mileage 0 Ford Mustang 2015 50000 1 Toyota Corolla 2017 25000 2 Chevrolet Camaro 2019 10000 3 Honda Civic 2018 30000 4 Ford F-150 2016 40000 5 Toyota Camry 2020 20000
Pandas Drop Rows
Use the drop()
function together with the index of the desired dropped rows to remove one or more rows from the DataFrame". Now, we’ll remove the top and bottom rows:
# Drop the first and last rows df = df.drop([0, 5]) print(df)
Output:
Make Model Year Mileage 1 Toyota Corolla 2017 25000 2 Chevrolet Camaro 2019 10000 3 Honda Civic 2018 30000 4 Ford F-150 2016 40000
Pandas Rename Column
We may use the rename()
function with a dictionary containing the old and new column names to rename a column" in the DataFrame". Now, the "Make"
column will be changed to "Brand":
# Rename the 'Make' column to 'Brand' df = df.rename(columns={'Make': 'Brand'}) print(df)
Output:
Brand Model Year Mileage 1 Toyota Corolla 2017 25000 2 Chevrolet Camaro 2019 10000 3 Honda Civic 2018 30000 4 Ford F-150 2016 40000
Pandas Select Columns With Specific Data Types
The select dtypes() function may be used to choose columns with a certain data type. We’ll choose columns of the object type here:
# Select columns of type 'object' obj_cols = df.select_dtypes(include=['object']) print(obj_cols)
Output:
Make Model 0 Ford Mustang 1 Toyota Corolla 2 Chevrolet Camaro 3 Honda Civic 4 Ford F-150 5 Toyota Camry
Pandas Slicing Dataset
To slice the DataFrame", we may use the indexing operator []
. We’ll pick rows 1 through 3" and columns 1 through 3" here:
# Slice the dataframe df_slice = df.iloc[1:4, 1:4] print(df_slice)
Output:
Model Year Mileage 1 Corolla 2017 25000 2 Camaro 2019 10000 3 Civic 2018 30000
Pandas Handle Duplicates
We may use the drop duplicates() function to manage duplicates in the DataFrame". We’ll eliminate the rows with duplicate values in the ‘Make’ and ‘Model’ columns here:
# Remove rows with duplicate values in 'Make' and 'Model' columns df_unique = df.drop_duplicates(subset=['Make', 'Model']) print(df_unique)
Output:
Make Model Year Mileage Price 0 Ford Mustang 2015 50000 20000 1 Toyota Corolla 2017 25000 18000 2 Chevrolet Camaro 2019 10000 25000 3 Honda Civic 2018 30000 22000 4 Ford F-150 2016 40000 30000
Pandas Select Specific Values In Column
We may use the loc
function to subset the data to choose certain values in a column. For example, suppose we wish to pick just the rows where the Make column contains the word "Ford":
ford_df = df.loc[df['Make'] == 'Ford'] print(ford_df)
Output:
Make Model Year Mileage Price 0 Ford Mustang 2015 50000 20000 4 Ford F-150 2016 40000 30000
Pandas Group By DataFrame
To use group by in a DataFrame", first group the data by one or more columns, and then apply a function to the groups. For instance, suppose we wish to group the data by the Make column and compute the average price for each make:
make_price_avg = df.groupby('Make')['Price'].mean() print(make_price_avg)
Output:
Make Chevrolet 25000
Summary
Python’s Pandas package is a potent tool for analysing and manipulating data. It offers several methods for manipulating data, including removing rows and columns, renaming columns, choosing columns according on the type of data they contain, slicing the dataset", managing duplicates, choosing certain values in columns, and grouping by data.
The most popular methods of data manipulation in pandas are dropping rows and columns. Based on their labels or index, the rows and columns in question can be eliminated from a DataFrame" using the drop()
function.
The Pandas rename()
function, which can accept a dictionary of old and new column names as input, can be used to rename columns.
You may choose columns depending on their data types, such as string or numeric data, by using the select Pandas dtypes()
function.
The Pandas iloc[]
method, which enables choosing certain rows and columns using their integer locations in the DataFrame", may be used to slice the dataset".
The Pandas drop_duplicates()
function, which eliminates duplicate rows depending on the values of the provided columns, can be used to handle duplicate rows.
Using the Pandas loc[]
technique, which enables choosing rows and columns based on their labels, it is possible to choose particular values in columns.
By grouping the data according to the values of one or more columns, the Pandas groupby()
function enables data grouping. The grouped data may then be subjected to other operations like sum, mean, or count.
Could You Please Share This Post?
I appreciate It And Thank YOU! :)
Have A Nice Day!