Part 2. Pandas Data Input and Output: CSV, Excel, SQL, JSON, HTML etc. – Pandas Tutorial – Over 10 Powerful API To Read/Write Data

Pandas Data Input and Output
Share this post and Earn Free Points!

In this post we will dive into topic: Pandas data input and output.

This section will teach you how to read and write data to and from a variety of file types, including CSV, Excel", SQL, HTML", Parquet", JSON etc. You’ll also learn how to manipulate data from other sources, such as databases and web sites.

Introduction

Welcome to the Pandas data input and output department! In this section, we’ll look at how to read and write data to and from several file formats.

We’ll go through common formats including CSV, Excel, SQL, HTML", Parquet", and JSON. We will also go through ways for modifying data from other sources, such as databases and online sites.

By the end of this course, you will have a solid grasp of how to work with data in Pandas and will be able to import and export data in a number of formats.

Most Common Data Formats

CSV

“Comma Separated Values” is abbreviated as CSV. It is a file format for storing tabular data in plain text. A CSV file’s lines indicate rows, and the values inside a row are separated by commas. As a result, it is a simple yet effective format for storing and exchanging data. The column headers, which are used to identify the fields in the data, are frequently seen on the first line of a CSV file. Numerous programmes, including Microsoft" Excel", Google Sheets, and many computer languages, including Python", support CSV. CSV is also a popular format for exchanging data across computers.

Excel

Excel" is a spreadsheet programme created by Microsoft". It is used to generate and handle several forms of data, including numbers, text, and formulae. Excel" files end in “.xls” or “.xlsx,” and they hold data in a tabular format, akin to a table in a relational database". Each page in an Excel" workbook represents a table, and each cell in the data represents a field. Excel" has a plethora of built-in data manipulation and analysis operations and capabilities, such as sorting, filtering, and graphing.

JSON

JSON is an abbreviation for JavaScript Object Notation. It is a lightweight data-transfer format that is simple for people to read and write while also being simple for machines to understand and produce. JSON is a language-independent text format that employs principles common to programmers of the C family of languages, which includes C, C++, C#, Java", JavaScript, Perl, Python", and many more. Because of these characteristics, JSON is an excellent data-interchange language.

JSON is a set of key-value pairs with strings as keys and values that can be strings, integers, booleans, arrays, or other JSON objects. JSON data is expressed using the JavaScript object literal syntax. JSON data is frequently used to send information between a server and a web application, or between various portions of a web service.
Pandas supports reading and writing JSON data using the pd.read json() and pd.to json() methods, respectively. This enables you to interact with JSON data in Python" and conduct different data manipulation and analysis operations with ease.

Parquet

Parquet" is a big data" columnar storage format. It is intended to facilitate the storing and retrieval of huge and complicated data collections. Parquet" is designed for columnar data storage and is especially well-suited for storing huge data sets with complicated schemas that are utilised for analytics.

One of Parquet’s primary advantages is its ability to compress and encode data in order to decrease the amount of disc space required to store it. This increases storage efficiency and accelerates data retrieval. Furthermore, Parquet" supports a variety of encoding techniques, including RLE, DICT, and PLAIN, which may be utilised to maximise storage and retrieval.

Many big data" tools and platforms, including Apache Hadoop", Apache Spark", and Apache Impala", support Parquet". Many data processing frameworks, including Pandas, support it as well.
The pd.read parquet() and pd.to parquet() methods in Pandas allow you to read and write Parquet" data. This enables you to deal with Parquet" data and conduct different data manipulation and analysis activities using Python".

Pandas Data Input and Output

Pandas is a sophisticated Python" module that lets you read and write data to and from a wide range of file types and data sources. Here is a list of some of the file types and data sources that Pandas can read and write:

  • Pandas Input Data Types:
    • CSV (Comma Separated Values) using pd.read_csv()
    • Excel" using pd.read_excel()
    • SQL using pd.read_sql()
    • JSON using pd.read_json()
    • HTML" using pd.read_html()
    • SAS using pd.read_sas()
    • STATA using pd.read_stata()
    • HDF5 using pd.read_hdf()
    • Pickle using pd.read_pickle()
    • SQLite using pd.read_sqlite()
    • Parquet" using pd.read_parquet()
    • and many more.
  • Pandas Output Data Types:
    • CSV using df.to_csv()
    • Excel" using df.to_excel()
    • SQL using df.to_sql()
    • JSON using df.to_json()
    • HTML" using df.to_html()
    • SAS using df.to_sas()
    • STATA using df.to_stata()
    • HDF5 using df.to_hdf()
    • Pickle using df.to_pickle()
    • SQLite using df.to_sqlite()
    • Parquet" using df.to_parquet()
    • and many more.

Input Data

Assume we have the input CSV Data as:

ID,Name,Age
1,AAA,10
2,BBB,20
3,CCC,30
4,DDD,40
5,EEE,50

Pandas CSV Tutorial

In the following example we will read data from CSV file, do some data manipulation and then save it again to CSV data format.

import pandas as pd

# Read CSV file
df = pd.read_csv('data/data.csv')

print(df.columns)

# Perform data manipulation
df['new_column'] = df['ID'] + df['Age']

# Write CSV file
df.to_csv('data_modified.csv', index=False)


print(df)

The result it:

Index(['ID', 'Name', 'Age'], dtype='object')
   ID Name  Age  new_column
0   1  AAA   10          11
1   2  BBB   20          22
2   3  CCC   30          33
3   4  DDD   40          44
4   5  EEE   50          55

Pandas Excel Tutorial

In the following example we will read data from Excel" file, do some data manipulation and then save it again to Excel" data format.

import pandas as pd

# Read Excel file
df = pd.read_excel('data.xlsx')

# Perform data manipulation
df['new_column'] = df['column1'] + df['column2']

# Pandas Write Excel
df.to_excel('data_modified.xlsx', index=False)

Pandas SQL Tutorial

In the following example we will read data from SQLite database file and some data manipulation on read data.

import pandas as pd
import sqlite3

# Connect to SQLite database
conn = sqlite3.connect('data.db')

# Read SQL query
df = pd.read_sql('SELECT * FROM data', conn)

# Perform data manipulation
df['new_column'] = df['column1'] + df['

Pandas JSON Tutorial

In the following example we will read data from JSON file, do some data manipulation and then save it again to JSON data format.

import pandas as pd

# Read JSON file
df = pd.read_json('data.json')

# Perform data manipulation
df['new_column'] = df['column1'] + df['column2']

# Write JSON file
df.to_json('data_modified.json', index=False)

Pandas HTML Tutorial

In the following example we will read data from HTML" file, do some data manipulation and then save it again to HTML" data format.

import pandas as pd

# Read HTML table
df = pd.read_html('data.html')[0]

# Perform data manipulation
df['new_column'] = df['column1'] + df['column2']

# Write HTML table
df.to_html('data_modified.html')

Pandas SAS Tutorial

In the following example we will read data from SAS file, do some data manipulation and then save it again to SAS data format.

import pandas as pd

# Read SAS file
df = pd.read_sas('data.sas7bdat')

# Perform data manipulation
df['new_column'] = df['column1'] + df['column2']

# Write SAS file
df.to_sas('data_modified.sas7bdat')

Summary

Pandas is a robust Python" library that lets you read and write data from a wide range of file formats and data sources, including CSV, Excel", SQL, JSON, HTML", SAS, STATA, HDF5, Pickle, SQLite, Parquet", and many others. To read a file, use functions like pd.read csv(), pd.read excel(), pd.read json(), and so on. To create a file, use functions like to csv(), to excel(), to json(), and so on. Additionally, you may manipulate the data before saving it to a new file. Adding additional columns, filtering rows, and other features are examples.

Could You Please Share This Post? 
I appreciate It And Thank YOU! :)
Have A Nice Day!

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?