Part 4. Pandas Data Manipulation Technics – 8 Basic Technics!

Part 4. Pandas Data Manipulation Technics - 8 Basic Technics!
Share this post and Earn Free Points!

This part will teach you how to alter and change your data = Pandas Data Manipulation. You will learn about various data sorting, filtering, and aggregation procedures, as well as how to execute fundamental mathematical operations on your data.

Introduction

Data Manipulation

The practise of modifying, reorganising, or cleansing data to make it more helpful for analysis or decision-making is referred to as data manipulation. It entails a wide range of tasks such as data sorting, filtering, aggregation, and transformation.

The objective of data manipulation is to change raw data into a more useable format for the intended audience or purpose. Manual data manipulation is possible, as can utilising specialist software such as Excel" or Python’s Pandas module. It is a critical phase in the data analysis process since it prepares the data for further analysis and visualisation. Data manipulation methods are frequently utilised to extract insights and make educated decisions in industries such as business, finance, and research.

Pandas DataFrame

A DataFrame" is one of the two primary data structures supplied by Python’s pandas package. It is a two-dimensional, size-mutable, heterogeneous tabular data structure containing rows and columns, much like a spreadsheet or SQL table.

Pandas Data Manipulation - Pandas DataFrame
Pandas DataFrame"

A DataFrame" is made up of one or more Series objects, which constitute the DataFrame’s columns. Each Series has a name and an index, and the DataFrame" has its own index for identifying rows. A DataFrame" can also have numerous indices, allowing for more complicated data structures to be created.

A DataFrame" may be created from a variety of data sources, including a CSV" file, an Excel" file, a SQL database", or a Python" dictionary. Once you’ve created a DataFrame", you may use it to do operations such as sorting, filtering, aggregating, and converting data.

DataFrames are commonly used for data analysis and modification. It’s powerful, versatile, and simple to use, with a large number of functions and methods for working with data. Pandas DataFrame" is likewise created to handle enormous datasets and is based on the NumPy library, which enables rapid array operations.

Pandas DataFrames is a strong tool for dealing with and managing enormous datasets; it’s frequently used in industries such as business, finance, and research to extract insights and make educated decisions.

Pandas Data Manipulation

No we will go thought Pandas data manipulation technics. To present the all technics I will use the Pandas DataFrame" about Cars.

import pandas as pd
import numpy as np

# Create a dictionary of data
data = {'Make': ['Ford', 'Toyota', 'Chevrolet', 'Honda', 'Ford', 'Toyota'],
        'Model': ['Mustang', 'Corolla', 'Camaro', 'Civic', 'F-150', 'Camry'],
        'Year': [2015, 2017, 2019, 2018, 2016, 2020],
        'Mileage': [50000, 25000, 10000, 30000, 40000, 20000],
        'Price': [20000, 18000, 25000, 22000, 30000, 19000]}

# Create the Pandas dataframe
df = pd.DataFrame(data)

# Print the dataframe
print(df)

Output:

        Make    Model  Year  Mileage  Price
0       Ford  Mustang  2015    50000  20000
1     Toyota  Corolla  2017    25000  18000
2  Chevrolet   Camaro  2019    10000  25000
3      Honda    Civic  2018    30000  22000
4       Ford    F-150  2016    40000  30000
5     Toyota    Camry  2020    20000  19000

Pandas Drop Columns

With the drop() function and the axis argument set to 1, we may remove one or more columns from the DataFrame". We’ll remove the "Price" column here:

# Drop the 'Price' column
df = df.drop('Price', axis=1)
print(df)

Output:

        Make    Model  Year  Mileage
0       Ford  Mustang  2015    50000
1     Toyota  Corolla  2017    25000
2  Chevrolet   Camaro  2019    10000
3      Honda    Civic  2018    30000
4       Ford    F-150  2016    40000
5     Toyota    Camry  2020    20000

Pandas Drop Rows

Use the drop() function together with the index of the desired dropped rows to remove one or more rows from the DataFrame". Now, we’ll remove the top and bottom rows:

# Drop the first and last rows
df = df.drop([0, 5])
print(df)

Output:

         Make    Model  Year  Mileage
1      Toyota  Corolla  2017    25000
2   Chevrolet   Camaro  2019    10000
3       Honda    Civic  2018    30000
4        Ford    F-150  2016    40000

Pandas Rename Column

We may use the rename() function with a dictionary containing the old and new column names to rename a column" in the DataFrame". Now, the "Make" column will be changed to "Brand":

# Rename the 'Make' column to 'Brand'
df = df.rename(columns={'Make': 'Brand'})
print(df)

Output:

       Brand    Model  Year  Mileage
1     Toyota  Corolla  2017    25000
2  Chevrolet   Camaro  2019    10000
3      Honda    Civic  2018    30000
4       Ford    F-150  2016    40000

Pandas Select Columns With Specific Data Types

The select dtypes() function may be used to choose columns with a certain data type. We’ll choose columns of the object type here:

# Select columns of type 'object'
obj_cols = df.select_dtypes(include=['object'])
print(obj_cols)

Output:

        Make    Model
0       Ford  Mustang
1     Toyota  Corolla
2  Chevrolet   Camaro
3      Honda    Civic
4       Ford    F-150
5     Toyota    Camry

Pandas Slicing Dataset

To slice the DataFrame", we may use the indexing operator []. We’ll pick rows 1 through 3" and columns 1 through 3" here:

# Slice the dataframe
df_slice = df.iloc[1:4, 1:4]
print(df_slice)

Output:

     Model  Year  Mileage
1  Corolla  2017    25000
2   Camaro  2019    10000
3    Civic  2018    30000

Pandas Handle Duplicates

We may use the drop duplicates() function to manage duplicates in the DataFrame". We’ll eliminate the rows with duplicate values in the ‘Make’ and ‘Model’ columns here:

# Remove rows with duplicate values in 'Make' and 'Model' columns
df_unique = df.drop_duplicates(subset=['Make', 'Model'])
print(df_unique)

Output:

        Make    Model  Year  Mileage  Price
0       Ford  Mustang  2015    50000  20000
1     Toyota  Corolla  2017    25000  18000
2  Chevrolet   Camaro  2019    10000  25000
3      Honda    Civic  2018    30000  22000
4       Ford    F-150  2016    40000  30000

Pandas Select Specific Values In Column

We may use the loc function to subset the data to choose certain values in a column. For example, suppose we wish to pick just the rows where the Make column contains the word "Ford":

ford_df = df.loc[df['Make'] == 'Ford']
print(ford_df)

Output:

    Make   Model  Year  Mileage  Price
0   Ford  Mustang  2015    50000  20000
4   Ford   F-150   2016    40000  30000

Pandas Group By DataFrame

To use group by in a DataFrame", first group the data by one or more columns, and then apply a function to the groups. For instance, suppose we wish to group the data by the Make column and compute the average price for each make:

make_price_avg = df.groupby('Make')['Price'].mean()
print(make_price_avg)

Output:

Make
Chevrolet    25000

Summary

Python’s Pandas package is a potent tool for analysing and manipulating data. It offers several methods for manipulating data, including removing rows and columns, renaming columns, choosing columns according on the type of data they contain, slicing the dataset", managing duplicates, choosing certain values in columns, and grouping by data.

The most popular methods of data manipulation in pandas are dropping rows and columns. Based on their labels or index, the rows and columns in question can be eliminated from a DataFrame" using the drop() function.

The Pandas rename() function, which can accept a dictionary of old and new column names as input, can be used to rename columns.

You may choose columns depending on their data types, such as string or numeric data, by using the select Pandas dtypes() function.

The Pandas iloc[] method, which enables choosing certain rows and columns using their integer locations in the DataFrame", may be used to slice the dataset".

The Pandas drop_duplicates() function, which eliminates duplicate rows depending on the values of the provided columns, can be used to handle duplicate rows.

Using the Pandas loc[] technique, which enables choosing rows and columns based on their labels, it is possible to choose particular values in columns.

By grouping the data according to the values of one or more columns, the Pandas groupby() function enables data grouping. The grouped data may then be subjected to other operations like sum, mean, or count.

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?