
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Reading Tabular Data
The Pandas library in Python provides a wide variety of functions to read tabular data from different sources, including CSV, Excel, SQL databases, JSON files, and more. Depending on your data format, you can choose an appropriate function (read_table(), read_csv(), read_excel(), etc.) to load the data into a PandasDataFrame for further analysis.
In this tutorial, we will learn about commonly used Pandas methods for reading tabular data, including examples for each format. Whether you're working with CSV files, Excel spreadsheets, or other delimited text files, Pandas makes it easy to load and analyze your tabular data in Pandas.
Common Methods in Pandas for Reading Tabular Data
Pandas provides several functions to load tabular data into Pandas objects. You can choose an appropriate method depending on the file format. The most commonly used methods are −
read_table(): Used for reading tab-separated values (TSV) or other delimited text files.
read_csv(): Used for reading CSV (comma-separated values) files.
read_excel(): Used for reading Excel files (.xls, .xlsx).
read_sql(): Used for reading data from a SQL database.
read_json(): Used for reading JSON files.
read_html(): Used for reading HTML tables.
Reading Tab-Delimited Files
Theread_table() function is used to read tab-separated values (TSV) or other delimited files. This method assumes that the data provided in the text file is separated by tabs (\t) by default.
Example
This example demonstrates reading tabular data from a tab-delimited text file to Pandas DataFrame using theread_table() method. In this example, we explicitly specified a column name to set it as the DataFrame index using theindex_col parameter.
import pandas as pd# Import StringIO to load a file-like objectfrom io import StringIO# Create a tab-delimited datadata = """Sr.no\tName\tGender\tAge1\tBraund\tmale\t222\tCumings\tfemale\t383\tHeikkinen\tfemale\t264\tFutrelle\tfemale\t35"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# Reading tab-delimited datadf = pd.read_table(obj, index_col="Sr.no")print("DataFrame from Tab-Delimited File:")print(df)Following is an output of the above code −
DataFrame from Tab-Delimited File:
| Name | Gender | Age | |
|---|---|---|---|
| Sr.no | |||
| 1 | Braund | male | 22 |
| 2 | Cumings | female | 38 |
| 3 | Heikkinen | female | 26 |
| 4 | Futrelle | female | 35 |
Reading CSV Files
The most common format for tabular data is CSV (comma-separated values). Pandas provides theread_csv() method to read tabular data from CSV files into a DataFrame. This method provides various customization options through its parameters, likeskiprows to skip rows,dtype to specify column data types, andna_values to treat specific values as NaN, and more.
Example
This example shows loading CSV data into Pandas DataFrame using theread_csv() method.
import pandas as pdurl ="https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/blood_pressure.csv"# Read a CSV file into a DataFramedf = pd.read_csv(url)# Display the first few rowsprint("Loaded CSV Tabular Data:")print(df.head())When we run the above program, it produces the following result −
Loaded CSV Tabular Data:
| patient | sex | agegrp | bp_before | bp_after | |
|---|---|---|---|---|---|
| 0 | 1 | Male | 30-45 | 143 | 153 |
| 1 | 2 | Male | 30-45 | 163 | 170 |
| 2 | 3 | Male | 30-45 | 153 | 168 |
| 3 | 4 | Male | 30-45 | 153 | 142 |
| 4 | 5 | Male | 30-45 | 146 | 141 |
Reading Excel Files
Excel is one of the most popular formats for storing tabular data. Pandas provides theread_excel() function to load data from Excel files (.xls, .xlsx).
Example
This example shows how to read tabular data from an excel file to Pandas DataFrame using theread_excel() method.
import pandas as pd# Read an Excel file df = pd.read_excel('data.xlsx')# Print the DataFrameprint("DataFrame from Excel File:")print(df)While executing the above code we get the following output −
DataFrame from Excel File:
| Car | Date_of_purchase | |
|---|---|---|
| 0 | BMW | 10-10-2024 |
| 1 | Lexus | 12-10-2024 |
| 2 | Audi | 17-10-2024 |
| 3 | Mercedes | 16-10-2024 |
| 4 | Jaguar | 19-10-2024 |
| 5 | Bentley | 22-10-2024 |
Reading Data from SQL Databases
Pandas provides theread_sql() method to load SQL table, execute SQL queries and load the result into a DataFrame.
Example
This example shows creating an in-memory SQL table from a DataFrame and loading back to new DataFrame using theto_sql() andread_sql() methods respectively.
import pandas as pdfrom sqlite3 import connect# Create a database connectionconn = connect(':memory:')# Create a sample DataFramedf = pd.DataFrame({"Col_1": list("abc"),"Col_2": list(range(1, 4)),"Col_3": pd.date_range("20240101", periods=3)})# Save the DataFrame in a SQL table with 'test_data' namedf.to_sql(name='test_data', con=conn)# Read SQL table into a DataFrameresult = pd.read_sql('SELECT Col_1, Col_2, Col_3 FROM test_data', conn)# Display the retrieved dataprint("DataFrame from SQL Table:")print(result.head())When we run the above program, it produces the following result −
DataFrame from SQL Table:
| Col_1 | Col_2 | Col_3 | |
|---|---|---|---|
| 0 | a | 1 | 2024-01-01 00:00:00 |
| 1 | b | 2 | 2024-01-02 00:00:00 |
| 2 | c | 3 | 2024-01-03 00:00:00 |
Reading JSON Files
JSON (JavaScript Object Notation) is a popular data interchange format. You can useread_json() to read data from JSON files into a DataFrame.
Example
This example shows reading JSON file data into Pandas DataFrame using theread_json() method.
import pandas as pd# Create a sample DataFramedf = pd.DataFrame({"Col_1": list("abc"),"Col_2": list(range(1, 4)),"Col_3": pd.date_range("20240101", periods=3)})# Write DataFrame to a JSON filedf.to_json("output_json_file.json")# Read JSON data into a DataFrameresult = pd.read_json("output_json_file.json")# Display the retrieved dataprint('DataFrame from JSON file:')print(result.head())When we run the above program, it produces the following result −
DataFrame from JSON file:
| Col_1 | Col_2 | Col_3 | |
|---|---|---|---|
| 0 | a | 1 | 2024-01-01 00:00:00 |
| 1 | b | 2 | 2024-01-02 00:00:00 |
| 2 | c | 3 | 2024-01-03 00:00:00 |
Reading HTML Tables
If you need to read tables directly from an HTML file or webpage,read_html() can scrape and parse HTML tables into a Pandas DataFrames.
Example
This example demonstrates reading tabular data from a HTML file using theread_html() method.
import pandas as pd# Create a HTML stringhtml_str = """<table> <tr><th>Col_1</th><th>Col_2</th><th>Col_3</th></tr> <tr><td>a</td><td>b</td><td>c</td></tr> <tr><td>x</td><td>y</td><td>z</td></tr></table>"""# Save to a temporary file and readwith open("temp.html", "w") as f: f.write(html_str)df = pd.read_html("temp.html")[0]# Display the outputprint("DataFrame from HTML File:")print(df)Following is an output of the above code −
DataFrame from HTML File:
| Col_1 | Col_2 | Col_3 | |
|---|---|---|---|
| 0 | a | b | c |
| 1 | x | y | z |