
- Python Pandas - Home
- Python Pandas - Introduction
- Python Pandas - Environment Setup
- Python Pandas - Basics
- Python Pandas - Introduction to Data Structures
- Python Pandas - Index Objects
- Python Pandas - Panel
- Python Pandas - Basic Functionality
- Python Pandas - Indexing & Selecting Data
- Python Pandas - Series
- Python Pandas - Series
- Python Pandas - Slicing a Series Object
- Python Pandas - Attributes of a Series Object
- Python Pandas - Arithmetic Operations on Series Object
- Python Pandas - Converting Series to Other Objects
- Python Pandas - DataFrame
- Python Pandas - DataFrame
- Python Pandas - Accessing DataFrame
- Python Pandas - Slicing a DataFrame Object
- Python Pandas - Modifying DataFrame
- Python Pandas - Removing Rows from a DataFrame
- Python Pandas - Arithmetic Operations on DataFrame
- Python Pandas - IO Tools
- Python Pandas - IO Tools
- Python Pandas - Working with CSV Format
- Python Pandas - Reading & Writing JSON Files
- Python Pandas - Reading Data from an Excel File
- Python Pandas - Writing Data to Excel Files
- Python Pandas - Working with HTML Data
- Python Pandas - Clipboard
- Python Pandas - Working with HDF5 Format
- Python Pandas - Comparison with SQL
- Python Pandas - Data Handling
- Python Pandas - Sorting
- Python Pandas - Reindexing
- Python Pandas - Iteration
- Python Pandas - Concatenation
- Python Pandas - Statistical Functions
- Python Pandas - Descriptive Statistics
- Python Pandas - Working with Text Data
- Python Pandas - Function Application
- Python Pandas - Options & Customization
- Python Pandas - Window Functions
- Python Pandas - Aggregations
- Python Pandas - Merging/Joining
- Python Pandas - MultiIndex
- Python Pandas - Basics of MultiIndex
- Python Pandas - Indexing with MultiIndex
- Python Pandas - Advanced Reindexing with MultiIndex
- Python Pandas - Renaming MultiIndex Labels
- Python Pandas - Sorting a MultiIndex
- Python Pandas - Binary Operations
- Python Pandas - Binary Comparison Operations
- Python Pandas - Boolean Indexing
- Python Pandas - Boolean Masking
- Python Pandas - Data Reshaping & Pivoting
- Python Pandas - Pivoting
- Python Pandas - Stacking & Unstacking
- Python Pandas - Melting
- Python Pandas - Computing Dummy Variables
- Python Pandas - Categorical Data
- Python Pandas - Categorical Data
- Python Pandas - Ordering & Sorting Categorical Data
- Python Pandas - Comparing Categorical Data
- Python Pandas - Handling Missing Data
- Python Pandas - Missing Data
- Python Pandas - Filling Missing Data
- Python Pandas - Interpolation of Missing Values
- Python Pandas - Dropping Missing Data
- Python Pandas - Calculations with Missing Data
- Python Pandas - Handling Duplicates
- Python Pandas - Duplicated Data
- Python Pandas - Counting & Retrieving Unique Elements
- Python Pandas - Duplicated Labels
- Python Pandas - Grouping & Aggregation
- Python Pandas - GroupBy
- Python Pandas - Time-series Data
- Python Pandas - Date Functionality
- Python Pandas - Timedelta
- Python Pandas - Sparse Data Structures
- Python Pandas - Sparse Data
- Python Pandas - Visualization
- Python Pandas - Visualization
- Python Pandas - Additional Concepts
- Python Pandas - Caveats & Gotchas
Python Pandas - Working with CSV Format
Working with the CSV format is a common task in data analysis and data science. CSV (Comma-Separated Values) files are widely used to store tabular data because they are lightweight, human-readable, and supported by almost all data analysis tools and programming languages.
The Python Pandas library is a powerful tool for working with data, it offers extensive functionality for reading, processing, and writing data in CSV format. With Pandas, you can easily handle complex operations like filtering, grouping, and manipulating data in CSV files.
A CSV file is a plain text file where data values are separated by commas, representing tabular data in plain text format. A CSV file has a.csv extension. Below you can see how the data present in the CSV file looks like −
Sr.no,Name,Gender,Age1,Braund,male,222,Cumings,female,383,Heikkinen,female,264,Futrelle,female,35
In this tutorial, we will learn how to work with CSV files using Pandas, including reading CSV files into DataFrames, understanding alternative reading methods, and handling large datasets, to exporting data back to CSV.
Reading a CSV File in Pandas
Thepandas.read_csv() function is used to read the CSV format file into the PandasDataFrame orTextFileReader. This function accepts CSV data from a URL or a local file path to load the data into the Pandas environment.
Example
The following example demonstrates how to read CSV data using thepandas.read_csv() function. Here we are using theStringIO to load the CSV string into a file-like object.
import pandas as pd# Import StringIO to load a file-like object for reading CSVfrom io import StringIO# Create string representing CSV datadata = """Name,Gender,AgeBraund,male,22Cumings,female,38Heikkinen,female,26Futrelle,female,35"""# Use StringIO to convert the string data into a file-like objectobj = StringIO(data)# read CSV into a Pandas DataFramedf = pd.read_csv(obj)print(df)
Following is the output of the above code −
| Name | Gender | Age | |
|---|---|---|---|
| 0 | Braund | male | 22 |
| 1 | Cumings | female | 38 |
| 2 | Heikkinen | female | 26 |
| 3 | Futrelle | female | 35 |
Writing Data to a CSV File
Pandas provides a method calledto_csv() to create or write CSV file using the Pandas data structures, such as DataFrame or Series objects. This function allows you to export your data to a CSV format.
Example
Here is an example demonstrating how towrite a Pandas DataFrame to a CSV file using theDataFrame.to_csv() method.
import pandas as pd# dictionary of listsd = {'Car': ['BMW', 'Lexus', 'Audi', 'Mercedes', 'Jaguar', 'Bentley'],'Date_of_purchase': ['2024-10-10', '2024-10-12', '2024-10-17', '2024-10-16', '2024-10-19', '2024-10-22']}# creating dataframe from the above dictionary of listsdataFrame = pd.DataFrame(d)print("Original DataFrame:\n",dataFrame)# write dataFrame to SalesRecords CSV filedataFrame.to_csv("Output_written_CSV_File.csv")# display the contents of the output csvprint("The output csv file written successfully...")Following is the output of the above code −
Original DataFrame:
| Car | Date of Purchase | |
|---|---|---|
| 0 | BMW | 2024-10-10 |
| 1 | Lexus | 2024-10-12 |
| 2 | Audi | 2024-10-17 |
| 3 | Mercedes | 2024-10-16 |
| 4 | Jaguar | 2024-10-19 |
| 5 | Bentley | 2024-10-22 |
If you visit your working directory after executing the above code, you can see the created CSV file namedOutput_written_CSV_File.csv.
Handling Large CSV Files in Pandas
When working with large CSV files, loading the entire file may cause memory issues. Pandas provides option likechunksize inpandas.read_csv() function to process such files efficiently in smaller chunks.
Example
Below is an example that initially creates a large CSV file using theDataFrame.to_csv() method with random integers and then processes it in chunks using the Pandasread_csv() function.
import pandas as pdimport numpy as np# Generate a DataFrame with random integersdata = np.random.randint(0, 100, size=(1000, 5))column_names = [f"Col_{i}" for i in range(1, 5 + 1)]# Create a DataFrame and save it as a CSV filelarge_csv_file = "large_file.csv"df = pd.DataFrame(data, columns=column_names)df.to_csv(large_csv_file, index=False)print(f"Large CSV file is created successfully.\n")# Read large CSV file in chunkschunk_size = 200print("Output CSV data in chunks:")for chunk in pd.read_csv('large_file.csv', chunksize=chunk_size): print('Data in chunks:') print(chunk.head(2))While executing the above code we obtain the following output −
Large CSV file is created successfully.Output CSV data in chunks:Data in chunks:
| Col_1 | Col_2 | Col_3 | Col_4 | Col_5 | |
|---|---|---|---|---|---|
| 0 | 57 | 74 | 2 | 85 | 54 |
| 1 | 23 | 27 | 28 | 51 | 69 |
| Col_1 | Col_2 | Col_3 | Col_4 | Col_5 | |
|---|---|---|---|---|---|
| 200 | 53 | 14 | 38 | 91 | 9 |
| 201 | 96 | 90 | 0 | 13 | 54 |
| Col_1 | Col_2 | Col_3 | Col_4 | Col_5 | |
|---|---|---|---|---|---|
| 400 | 28 | 52 | 2 | 57 | 90 |
| 401 | 57 | 27 | 60 | 91 | 46 |
| Col_1 | Col_2 | Col_3 | Col_4 | Col_5 | |
|---|---|---|---|---|---|
| 600 | 97 | 15 | 66 | 67 | 30 |
| 601 | 34 | 50 | 38 | 95 | 50 |
| Col_1 | Col_2 | Col_3 | Col_4 | Col_5 | |
|---|---|---|---|---|---|
| 800 | 81 | 12 | 15 | 3 | 61 |
| 801 | 77 | 75 | 52 | 95 | 87 |
Alternatives to Reading CSV Files
In addition to thepandas.read_csv() function, Pandas provides an alternative method for reading CSV data usingpandas.read_table() function.
Thepandas.read_table() function is used to read general delimited files such as CSV, TSV, or other delimiter-separated formats into a Pandas DataFrame. It is a good alternative for loading CSV files, and it easily handles various delimiters using thesep parameter. Additionally, this function supports iterating or breaking of the file into chunks.
Example
This example shows an alternative way to load CSV data into the Pandas DataFrame using thepd.read_table() function. Here you need to specify the delimiter with thesep parameter to read comma-separated values (CSV).
import pandas as pdurl ="https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/blood_pressure.csv"# read CSV into a Pandas DataFrame using the read_table() functiondf = pd.read_table(url,sep=',')print(df.head(5))
Following is the output of the above code −
| Patient | Sex | Age Group | BP Before | BP After | |
|---|---|---|---|---|---|
| 0 | 1 | Male | 30-45 | 143 | 153 |
| 1 | 2 | Male | 30-45 | 163 | 170 |
| 2 | 3 | Male | 30-45 | 153 | 168 |
| 3 | 4 | Male | 30-45 | 153 | 142 |
| 4 | 5 | Male | 30-45 | 146 | 141 |