Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - IO Tools



The Pandas library offers powerful I/O tools (API) for data import and export, enabling seamless handling of various file formats likeCSV,Excel,JSON, and many more. This API includes top-level reader functions like,pd.read_csv(),read_clipboard() and corresponding writer methods like,to_csv(),to_clipboard() for easy data handling.

In this tutorial, we will learn about the overview of the Pandas I/O tools and learn how to use them effectively.

Overview of Pandas IO Tools

The Pandas I/O API supports a wide variety of data formats. Here is a summary of supported formats and their corresponding reader and writer functions −

FormatReader FunctionWriter Function
Tabular Dataread_table()NA
CSVread_csv()to_csv()
Fixed-Width Text Fileread_fwf()NA
Clipboardread_clipboard()to_clipboard()
Picklingread_pickle()to_pickle()
Excelread_excel()to_excel()
JSONread_json()to_json()
HTMLread_html()to_html()
XMLread_xml()to_xml()
LaTeXNAto_latex()
HDF5 Formatread_hdf()to_hdf()
Featherread_feather()to_feather()
Parquetread_parquet()to_parquet()
ORCread_orc()to_orc()
SQLread_sql()to_sql()
Stataread_stata()to_stata()

Among these, the most frequently used functions for handling text files areread_csv() andread_table(). Both convert flat files into DataFrame objects.

Example: Reading CSV Data

This example shows reading the CSV data using the pandasread_csv() function. In this example we are using the StringIO to load the CSV string into a Pandas DataFrame object.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj)print(df)

Itsoutput is as follows −

S.NoNameAgeCitySalary
01Tom28Toronto20000
12Lee32HongKong3000
23Steven43Bay Area8300
34Ram38Hyderabad3900

Customizing Parsing Options

Pandas allows several customization options when parsing data. You can modify how the data is parsed using parameters like −

  • Index_col

  • dtype

  • names

  • skiprows

Below we will discuss about the common parsing options for customization.

Customizing the index

You can customize the row labels or index of the Pandas object by usingindex_col parameter. Settingindex_col=False forces Pandas to not use the first column as the index, which can be helpful when handling malformed files with extra delimiters.

Example

This example uses theindex_col parameter to customize the row labels while reading the CSV data.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj, index_col=['S.No'])# Display the DataFrameprint(df)

Itsoutput is as follows −

S.NoNameAgeCitySalary
1Tom28Toronto20000
2Lee32HongKong3000
3Steven43Bay Area8300
4Ram38Hyderabad3900

Converters

Pandas also provides the ability to specify the data type for columns using thedtype parameter. You can convert columns to specific types like {'Col_1': np.float64, 'Col_2': np.int32, 'Col3': 'Int64'}.

Example

This example customizes the data type of aJSON data while parsing the data using theread_json() method with thedtype parameter.

import pandas as pdfrom io import StringIOimport numpy as np# Create a string representing JSON datadata = """[    {"Name": "Braund", "Gender": "Male", "Age": 30},    {"Name": "Cumings", "Gender": "Female", "Age": 25},    {"Name": "Heikkinen", "Gender": "Female", "Age": 35}]"""# Use StringIO to convert the JSON-formatted string data into a file-like objectobj = StringIO(data)# Read JSON into a Pandas DataFramedf = pd.read_json(obj, dtype={'Age': np.float64})# Display the DataFrameprint(df.dtypes)

Itsoutput is as follows −

Name       objectGender     objectAge       float64dtype: object

By default, thedtype of the 'Age' column isint, but the result shows it asfloat because we have explicitly casted the type.

Thus, the data looks like float −

NameGenderAge
0BraundMale30.0
1CumingsFemale25.0
2HeikkinenFemale35.0

Customizing the Header Names

When reading data files, Pandas assumes the first row as the header. However, you can customize this using thenames Parameter to provide custom column names.

Example

This example reads theXML data into a Pandas DataFrame object by customizing the header names using thenames parameter of theread_xml() method.

import pandas as pdfrom io import StringIO# Create a String representing XML data xml = """<?xml version="1.0" encoding="UTF-8"?><bookstore>  <book category="cooking">    <title lang="en">Everyday Italian</title>    <author>Giada De Laurentiis</author>    <year>2005</year>    <price>30.00</price>  </book>  <book category="children">    <title lang="en">Harry Potter</title>    <author>J K. Rowling</author>    <year>2005</year>    <price>29.99</price>  </book>  <book category="web">    <title lang="en">Learning XML</title>    <author>Erik T. Ray</author>    <year>2003</year>    <price>39.95</price>  </book></bookstore>"""# Parse the XML data with custom column namesdf = pd.read_xml(StringIO(xml), names=['a', 'b', 'c','d','e'])# Display the Output DataFrameprint('Output DataFrame from XML:')print(df)

Itsoutput is as follows −

Output DataFrame from XML:
abcde
0cookingEveryday ItalianGiada De Laurentiis200530.00
1childrenHarry PotterJ K. Rowling200529.99
2webLearning XMLErik T. Ray200339.95

Example: Reading with custom column names and header row

If the header is in a row other than the first, pass the row number to header. This will skip the preceding rows.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj, names=['a', 'b', 'c','d','e'], header=0)# Display the DataFrameprint(df)

Itsoutput is as follows −

abcde
0S.NoNameAgeCitySalary
11Tom28Toronto20000
22Lee32HongKong3000
33Steven43Bay Area8300
44Ram38Hyderabad3900

Skipping Rows

Theskiprows parameter allows you to skip a specific number of rows or line numbers when reading a file. It can also accept a callable function to decide which rows to skip based on conditions.

Example

This example shows skipping the rows of a input data while parsing.

import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """S.No,Name,Age,City,Salary1,Tom,28,Toronto,200002,Lee,32,HongKong,30003,Steven,43,Bay Area,83004,Ram,38,Hyderabad,3900"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)    # read CSV into a Pandas DataFramedf = pd.read_csv(obj, skiprows=2)# Display the DataFrameprint(df)

Itsoutput is as follows −

S.NoNameAgeCitySalary
2Lee32HongKong3000
03Steven43Bay Area8300
14Ram38Hyderabad3900
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp