In this post we will dive into topic: Pandas data input and output.
This section will teach you how to read and write data to and from a variety of file types, including CSV, Excel", SQL, HTML", Parquet", JSON etc. You’ll also learn how to manipulate data from other sources, such as databases and web sites.
Table of Contents
Introduction
Welcome to the Pandas data input and output department! In this section, we’ll look at how to read and write data to and from several file formats.
We’ll go through common formats including CSV, Excel, SQL, HTML", Parquet", and JSON. We will also go through ways for modifying data from other sources, such as databases and online sites.
By the end of this course, you will have a solid grasp of how to work with data in Pandas and will be able to import and export data in a number of formats.
Most Common Data Formats
CSV
“Comma Separated Values” is abbreviated as CSV. It is a file format for storing tabular data in plain text. A CSV file’s lines indicate rows, and the values inside a row are separated by commas. As a result, it is a simple yet effective format for storing and exchanging data. The column headers, which are used to identify the fields in the data, are frequently seen on the first line of a CSV file. Numerous programmes, including Microsoft" Excel", Google Sheets, and many computer languages, including Python", support CSV. CSV is also a popular format for exchanging data across computers.
Excel
Excel" is a spreadsheet programme created by Microsoft". It is used to generate and handle several forms of data, including numbers, text, and formulae. Excel" files end in “.xls” or “.xlsx,” and they hold data in a tabular format, akin to a table in a relational database". Each page in an Excel" workbook represents a table, and each cell in the data represents a field. Excel" has a plethora of built-in data manipulation and analysis operations and capabilities, such as sorting, filtering, and graphing.
JSON
JSON is an abbreviation for JavaScript Object Notation. It is a lightweight data-transfer format that is simple for people to read and write while also being simple for machines to understand and produce. JSON is a language-independent text format that employs principles common to programmers of the C family of languages, which includes C, C++, C#, Java", JavaScript, Perl, Python", and many more. Because of these characteristics, JSON is an excellent data-interchange language.
JSON is a set of key-value pairs with strings as keys and values that can be strings, integers, booleans, arrays, or other JSON objects. JSON data is expressed using the JavaScript object literal syntax. JSON data is frequently used to send information between a server and a web application, or between various portions of a web service.
Pandas supports reading and writing JSON data using the pd.read json() and pd.to json() methods, respectively. This enables you to interact with JSON data in Python" and conduct different data manipulation and analysis operations with ease.
Parquet
Parquet" is a big data" columnar storage format. It is intended to facilitate the storing and retrieval of huge and complicated data collections. Parquet" is designed for columnar data storage and is especially well-suited for storing huge data sets with complicated schemas that are utilised for analytics.
One of Parquet’s primary advantages is its ability to compress and encode data in order to decrease the amount of disc space required to store it. This increases storage efficiency and accelerates data retrieval. Furthermore, Parquet" supports a variety of encoding techniques, including RLE, DICT, and PLAIN, which may be utilised to maximise storage and retrieval.
Many big data" tools and platforms, including Apache Hadoop", Apache Spark", and Apache Impala", support Parquet". Many data processing frameworks, including Pandas, support it as well.
The pd.read parquet() and pd.to parquet() methods in Pandas allow you to read and write Parquet" data. This enables you to deal with Parquet" data and conduct different data manipulation and analysis activities using Python".
Pandas Data Input and Output
Pandas is a sophisticated Python" module that lets you read and write data to and from a wide range of file types and data sources. Here is a list of some of the file types and data sources that Pandas can read and write:
- Pandas Input Data Types:
- CSV (Comma Separated Values) using
pd.read_csv()
- Excel" using
pd.read_excel()
- SQL using
pd.read_sql()
- JSON using
pd.read_json()
- HTML" using
pd.read_html()
- SAS using
pd.read_sas()
- STATA using
pd.read_stata()
- HDF5 using
pd.read_hdf()
- Pickle using
pd.read_pickle()
- SQLite using
pd.read_sqlite()
- Parquet" using
pd.read_parquet()
- and many more.
- CSV (Comma Separated Values) using
- Pandas Output Data Types:
Input Data
Assume we have the input CSV Data as:
ID,Name,Age 1,AAA,10 2,BBB,20 3,CCC,30 4,DDD,40 5,EEE,50
Pandas CSV Tutorial
In the following example we will read data from CSV file, do some data manipulation and then save it again to CSV data format.
import pandas as pd # Read CSV file df = pd.read_csv('data/data.csv') print(df.columns) # Perform data manipulation df['new_column'] = df['ID'] + df['Age'] # Write CSV file df.to_csv('data_modified.csv', index=False) print(df)
The result it:
Index(['ID', 'Name', 'Age'], dtype='object') ID Name Age new_column 0 1 AAA 10 11 1 2 BBB 20 22 2 3 CCC 30 33 3 4 DDD 40 44 4 5 EEE 50 55
Pandas Excel Tutorial
In the following example we will read data from Excel" file, do some data manipulation and then save it again to Excel" data format.
import pandas as pd # Read Excel file df = pd.read_excel('data.xlsx') # Perform data manipulation df['new_column'] = df['column1'] + df['column2'] # Pandas Write Excel df.to_excel('data_modified.xlsx', index=False)
Pandas SQL Tutorial
In the following example we will read data from SQLite database file and some data manipulation on read data.
import pandas as pd import sqlite3 # Connect to SQLite database conn = sqlite3.connect('data.db') # Read SQL query df = pd.read_sql('SELECT * FROM data', conn) # Perform data manipulation df['new_column'] = df['column1'] + df['
Pandas JSON Tutorial
In the following example we will read data from JSON file, do some data manipulation and then save it again to JSON data format.
import pandas as pd # Read JSON file df = pd.read_json('data.json') # Perform data manipulation df['new_column'] = df['column1'] + df['column2'] # Write JSON file df.to_json('data_modified.json', index=False)
Pandas HTML Tutorial
In the following example we will read data from HTML" file, do some data manipulation and then save it again to HTML" data format.
import pandas as pd # Read HTML table df = pd.read_html('data.html')[0] # Perform data manipulation df['new_column'] = df['column1'] + df['column2'] # Write HTML table df.to_html('data_modified.html')
Pandas SAS Tutorial
In the following example we will read data from SAS file, do some data manipulation and then save it again to SAS data format.
import pandas as pd # Read SAS file df = pd.read_sas('data.sas7bdat') # Perform data manipulation df['new_column'] = df['column1'] + df['column2'] # Write SAS file df.to_sas('data_modified.sas7bdat')
Summary
Pandas is a robust Python" library that lets you read and write data from a wide range of file formats and data sources, including CSV, Excel", SQL, JSON, HTML", SAS, STATA, HDF5, Pickle, SQLite, Parquet", and many others. To read a file, use functions like pd.read csv(), pd.read excel(), pd.read json(), and so on. To create a file, use functions like to csv(), to excel(), to json(), and so on. Additionally, you may manipulate the data before saving it to a new file. Adding additional columns, filtering rows, and other features are examples.
Could You Please Share This Post?
I appreciate It And Thank YOU! :)
Have A Nice Day!