Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas - Reading Data from an Excel File



The Pandas library provides powerful tool for data manipulation and analysis. Among its many features, it offers the ability to read and write data to Excel files easily.Excel files are widely used to store and organize data in tabular format, making them an excellent choice for analysis. Excel files can have multiple sheets, each containing rows and columns of data. Excel files usually come with extensions like.xls or.xlsx.

Pandas provides easy-to-use methods for working with Excel data directly in Python. One of such method is theread_excel() method, which reads Excel files and loads the data into a Pandas DataFrame.

In this tutorial, we will learn how to read data from Excel files using thepandas.read_excel() method, covering different scenarios like loading a single sheet, specific sheets, and multiple sheets.

Supported Excel File Formats in Pandas

Pandas uses different backends to read various Excel file formats −

  • TheExcel 2007+ (.xlsx) files can be read using theopenpyxl Python module.

  • TheExcel 2003 (.xls) files can be read using thexlrd module.

  • TheBinary Excel (.xlsb) files can be read using thepyxlsb module.

  • Additionally, all formats can be read using the Calamine engine.

Note: Please make sure that you have installed the required packages (xlrd and openpyxl) in your system. If these packages are not installed, use the following commands to install them −
pip install openpyxl
or
pip install xlrd

Reading an Excel File in Pandas

Thepandas.read_excel() method reads Excel files and loads the data into a Pandas DataFrame. This method supports multiple Excel file formats like,.xls,.xlsx,.xlsm, and more from a local filesystem or a URL.

Example

Here is a simple example of reading a local system Excel file into a DataFrame using thepandas.read_excel() method by specifying the file path.

import pandas as pd# Read an Excel filedf = pd.read_excel('data.xlsx')# Print the DataFrameprint('Output DataFrame:')print(df)

Following is the output of the above code −

Output DataFrame:
Sr.noNameGenderAge
01Braundfemale38
12Cumingsmale22
23Heikkinfemale35
34Futrellefemale26

Reading a Specific Sheet from an Excel file

The Excel files may contain multiple sheets with different names. To read a specific sheet into a Pandas DataFrame, you can specify the sheet name or index to thesheet_name parameter of thepandas.read_excel() method.

Example

The following example demonstrates how to read a specific sheet from an Excel file into a Pandas DataFrame usingpandas.read_excel() method. Here we will specify the sheet name to thesheet_name parameter to read that specific sheet.

import pandas as pd# Read a specific sheetdf = pd.read_excel('data.xlsx', sheet_name="Sheet_2")# Print the DataFrameprint('Output DataFrame:')print(df)

Following is the output of the above code −

Output DataFrame:
NameValue
0string11
1string23
2Comment5

Reading Multiple Sheets as a Dictionary of DataFrames

If an Excel file contains multiple sheets and you need to read few of them into the Pandas DataFrame, you can pass a list of sheet names or indices to thesheet_name parameter of thepandas.read_excel() method.

Example

This example uses thepandas.read_excel() method to read the multiple sheets in an Excel file to a dictionary of DataFrames.

import pandas as pd# Read multiple sheetsdf = pd.read_excel('data.xlsx', sheet_name=[0, 1])# Print the DataFrameprint('Output Dict of DataFrames:')print(df)

Following is the output of the above code −

Output Dict of DataFrames:{0:    Sr.no      Name  Gender  Age0      1    Braund  female   381      2   Cumings    male   222      3   Heikkin  female   353      4  Futrelle  female   26, 1:       Name  Value0  string1      11  string2      32  Comment      5}

Reading MultiIndex Data from Excel

You can readMultiIndexed data from an Excel file using thepandas.read_excel() method, which is useful for working with hierarchical data. By specifying the lists of columns forindex_col and lists of rows forheader parameters to handle MultiIndex indices and columns.

Example

This example uses theto_excel() method andpandas.read_excel() method create an excel sheet withMultiIndexed data and read it back to Pandas DataFrame respectively.

import pandas as pd# Create a MultiIndex objectindex = pd.MultiIndex.from_tuples([('A', 'one'), ('A', 'two'), ('B', 'one'), ('B', 'two')])# Create a DataFramedata = [[1, 2], [3, 4], [5, 6], [7, 8]]df = pd.DataFrame(data, index=index, columns=['X', 'Y'])df.to_excel("multiindex_data.xlsx")# Read MultiIndex rows and columnsdf = pd.read_excel("multiindex_data.xlsx", index_col=[0, 1])print('Output DataFrame from Excel File:')print(df)

Following is the output of the above code −

Output DataFrame from Excel File:
XY
Aone12
two34
Bone56
two78
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp