Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas - IO Tools

The Pandas library offers powerful I/O tools (API) for data import and export, enabling seamless handling of various file formats likeCSV,Excel,JSON, and many more. This API includes top-level reader functions like,pd.read_csv(),read_clipboard() and corresponding writer methods like,to_csv(),to_clipboard() for easy data handling.

In this tutorial, we will learn about the overview of the Pandas I/O tools and learn how to use them effectively.

Overview of Pandas IO Tools

The Pandas I/O API supports a wide variety of data formats. Here is a summary of supported formats and their corresponding reader and writer functions −

Format	Reader Function	Writer Function
Tabular Data	read_table()	NA
CSV	read_csv()	to_csv()
Fixed-Width Text File	read_fwf()	NA
Clipboard	read_clipboard()	to_clipboard()
Pickling	read_pickle()	to_pickle()
Excel	read_excel()	to_excel()
JSON	read_json()	to_json()
HTML	read_html()	to_html()
XML	read_xml()	to_xml()
LaTeX	NA	to_latex()
HDF5 Format	read_hdf()	to_hdf()
Feather	read_feather()	to_feather()
Parquet	read_parquet()	to_parquet()
ORC	read_orc()	to_orc()
SQL	read_sql()	to_sql()
Stata	read_stata()	to_stata()

Among these, the most frequently used functions for handling text files areread_csv() andread_table(). Both convert flat files into DataFrame objects.

Example: Reading CSV Data

This example shows reading the CSV data using the pandasread_csv() function. In this example we are using the StringIO to load the CSV string into a Pandas DataFrame object.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj)print(df)

Itsoutput is as follows −

	S.No	Name	Age	City	Salary
0	1	Tom	28	Toronto	20000
1	2	Lee	32	HongKong	3000
2	3	Steven	43	Bay Area	8300
3	4	Ram	38	Hyderabad	3900

Customizing Parsing Options

Pandas allows several customization options when parsing data. You can modify how the data is parsed using parameters like −

Index_col
dtype
names
skiprows

Below we will discuss about the common parsing options for customization.

Customizing the index

You can customize the row labels or index of the Pandas object by usingindex_col parameter. Settingindex_col=False forces Pandas to not use the first column as the index, which can be helpful when handling malformed files with extra delimiters.

Example

This example uses theindex_col parameter to customize the row labels while reading the CSV data.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj, index_col=['S.No'])# Display the DataFrameprint(df)

Itsoutput is as follows −

S.No	Name	Age	City	Salary
1	Tom	28	Toronto	20000
2	Lee	32	HongKong	3000
3	Steven	43	Bay Area	8300
4	Ram	38	Hyderabad	3900

Converters

Pandas also provides the ability to specify the data type for columns using thedtype parameter. You can convert columns to specific types like {'Col_1': np.float64, 'Col_2': np.int32, 'Col3': 'Int64'}.

Example

This example customizes the data type of aJSON data while parsing the data using theread_json() method with thedtype parameter.

import pandas as pdfrom io import StringIOimport numpy as np# Create a string representing JSON datadata = """[    {"Name": "Braund", "Gender": "Male", "Age": 30},    {"Name": "Cumings", "Gender": "Female", "Age": 25},    {"Name": "Heikkinen", "Gender": "Female", "Age": 35}]"""# Use StringIO to convert the JSON-formatted string data into a file-like objectobj = StringIO(data)# Read JSON into a Pandas DataFramedf = pd.read_json(obj, dtype={'Age': np.float64})# Display the DataFrameprint(df.dtypes)

Itsoutput is as follows −

Name       objectGender     objectAge       float64dtype: object

By default, thedtype of the 'Age' column isint, but the result shows it asfloat because we have explicitly casted the type.

Thus, the data looks like float −

	Name	Gender	Age
0	Braund	Male	30.0
1	Cumings	Female	25.0
2	Heikkinen	Female	35.0

Customizing the Header Names

When reading data files, Pandas assumes the first row as the header. However, you can customize this using thenames Parameter to provide custom column names.

Example

This example reads theXML data into a Pandas DataFrame object by customizing the header names using thenames parameter of theread_xml() method.

import pandas as pdfrom io import StringIO# Create a String representing XML data xml = """<?xml version="1.0" encoding="UTF-8"?><bookstore>  <book category="cooking">    <title lang="en">Everyday Italian</title>    <author>Giada De Laurentiis</author>    <year>2005</year>    <price>30.00</price>  </book>  <book category="children">    <title lang="en">Harry Potter</title>    <author>J K. Rowling</author>    <year>2005</year>    <price>29.99</price>  </book>  <book category="web">    <title lang="en">Learning XML</title>    <author>Erik T. Ray</author>    <year>2003</year>    <price>39.95</price>  </book></bookstore>"""# Parse the XML data with custom column namesdf = pd.read_xml(StringIO(xml), names=['a', 'b', 'c','d','e'])# Display the Output DataFrameprint('Output DataFrame from XML:')print(df)

Itsoutput is as follows −

Output DataFrame from XML:

	a	b	c	d	e
0	cooking	Everyday Italian	Giada De Laurentiis	2005	30.00
1	children	Harry Potter	J K. Rowling	2005	29.99
2	web	Learning XML	Erik T. Ray	2003	39.95

Example: Reading with custom column names and header row

If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj, names=['a', 'b', 'c','d','e'], header=0)# Display the DataFrameprint(df)

Itsoutput is as follows −

	a	b	c	d	e
0	S.No	Name	Age	City	Salary
1	1	Tom	28	Toronto	20000
2	2	Lee	32	HongKong	3000
3	3	Steven	43	Bay Area	8300
4	4	Ram	38	Hyderabad	3900

Skipping Rows

Theskiprows parameter allows you to skip a specific number of rows or line numbers when reading a file. It can also accept a callable function to decide which rows to skip based on conditions.

Example

This example shows skipping the rows of a input data while parsing.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)    # read CSV into a Pandas DataFramedf = pd.read_csv(obj, skiprows=2)# Display the DataFrameprint(df)

Itsoutput is as follows −

	S.No	Name	Age	City	Salary
2	Lee	32	HongKong	3000
0	3	Steven	43	Bay Area	8300
1	4	Ram	38	Hyderabad	3900

Print Page

Movatterモバイル変換

Python Pandas - IO Tools

Overview of Pandas IO Tools

Example: Reading CSV Data

Customizing Parsing Options

Customizing the index

Example

Converters

Example

Customizing the Header Names

Example

Example: Reading with custom column names and header row

Skipping Rows

Example