Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Python Pandas - Basics
Python Pandas - Introduction to Data Structures
Python Pandas - Index Objects
Python Pandas - Panel
Python Pandas - Basic Functionality
Python Pandas - Indexing & Selecting Data
Python Pandas - Series
Python Pandas - Series
Python Pandas - Slicing a Series Object
Python Pandas - Attributes of a Series Object
Python Pandas - Arithmetic Operations on Series Object
Python Pandas - Converting Series to Other Objects
Python Pandas - DataFrame
Python Pandas - DataFrame
Python Pandas - Accessing DataFrame
Python Pandas - Slicing a DataFrame Object
Python Pandas - Modifying DataFrame
Python Pandas - Removing Rows from a DataFrame
Python Pandas - Arithmetic Operations on DataFrame
Python Pandas - IO Tools
Python Pandas - IO Tools
Python Pandas - Working with CSV Format
Python Pandas - Reading & Writing JSON Files
Python Pandas - Reading Data from an Excel File
Python Pandas - Writing Data to Excel Files
Python Pandas - Working with HTML Data
Python Pandas - Clipboard
Python Pandas - Working with HDF5 Format
Python Pandas - Comparison with SQL
Python Pandas - Data Handling
Python Pandas - Sorting
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Concatenation
Python Pandas - Statistical Functions
Python Pandas - Descriptive Statistics
Python Pandas - Working with Text Data
Python Pandas - Function Application
Python Pandas - Options & Customization
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Merging/Joining
Python Pandas - MultiIndex
Python Pandas - Basics of MultiIndex
Python Pandas - Indexing with MultiIndex
Python Pandas - Advanced Reindexing with MultiIndex
Python Pandas - Renaming MultiIndex Labels
Python Pandas - Sorting a MultiIndex
Python Pandas - Binary Operations
Python Pandas - Binary Comparison Operations
Python Pandas - Boolean Indexing
Python Pandas - Boolean Masking
Python Pandas - Data Reshaping & Pivoting
Python Pandas - Pivoting
Python Pandas - Stacking & Unstacking
Python Pandas - Melting
Python Pandas - Computing Dummy Variables
Python Pandas - Categorical Data
Python Pandas - Categorical Data
Python Pandas - Ordering & Sorting Categorical Data
Python Pandas - Comparing Categorical Data
Python Pandas - Handling Missing Data
Python Pandas - Missing Data
Python Pandas - Filling Missing Data
Python Pandas - Interpolation of Missing Values
Python Pandas - Dropping Missing Data
Python Pandas - Calculations with Missing Data
Python Pandas - Handling Duplicates
Python Pandas - Duplicated Data
Python Pandas - Counting & Retrieving Unique Elements
Python Pandas - Duplicated Labels
Python Pandas - Grouping & Aggregation
Python Pandas - GroupBy
Python Pandas - Time-series Data
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Sparse Data Structures
Python Pandas - Sparse Data
Python Pandas - Visualization
Python Pandas - Visualization
Python Pandas - Additional Concepts
Python Pandas - Caveats & Gotchas

Python Pandas read_stata() Method

Theread_stata() method in Python's Pandas library is used to read or load data from a Stata dataset file into a Pandas DataFrame. In other words, this method allows you to import data from Stata's.dta files into a Pandas DataFrame, enabling easy data manipulation and analysis in Python.Stata is a software tool widely used for statistical analysis, and its dataset files are a common format for storing structured data, which is developed byStataCorp.

Thisread_stata() method supports features like automatic handling of Stata-specific data types, optional column selection, and chunk-based reading for large datasets. It allows users to convert categorical variables, handle missing values, and preserve data types.

Syntax

Below is the syntax of the Python Pandas read_stata() method −

pandas.read_stata(filepath_or_buffer, *, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, compression='infer', storage_options=None)

Parameters

The Python Pandas read_stata() method accepts the below parameters −

filepath_or_buffer: A string, path object, or file-like object representing the location of the Stata dataset file to read.
convert_dates: A boolean indicating whether to convert date variables to Pandas datetime values. By default it is set toTrue.
convert_categoricals: A boolean indicating whether to read value labels and convert columns to Categorical/Factor variables. By default it is set toTrue.
index_col: Specifies the column to use as the DataFrame index. If None, no column is used as the index.
convert_missing: A boolean indicating whether to convert missing values to their Stata representations. If set toTrue, columns containing missing values are returned with object data types and missing values are represented byStataMissingValue objects. If set toFalse, missing values are replaced withnan.
preserve_dtypes: IfTrue, preserves the original data types of variables in the Stata file. IfFalse, numeric data are directed to pandas default types for foreign data (float64 or int64).
columns: Specifies a subset of columns to include in the output. By default, it includes all columns.
order_categoricals: Determines whether the converted categorical data are ordered.
chunksize: Read Stata data in chunks of specified size.
iterator: Returns the StataReader object.
compression: Specifies the compression method to use. If set to 'infer', the method will automatically detect the compression type based on the file extension (e.g., .gz, .bz2, .zip, .xz, .zst, .tar, .tar.gz, or .tar.bz2).
storage_options: Additional options for connecting to certain storage back-ends (e.g., AWS S3, Google Cloud Storage).

Return Value

The Pandasread_stata() method returns aDataFrame containing the data read from the specified Stata file orpandas.api.typing.StataReader object.

Example: Basic Reading of a Stata Dataset File

Here is a basic example demonstrating reading a Stata dataset file into a Pandas DataFrame using theread_stata() method.

import pandas as pd# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to a Stata filedf.to_stata("stata_file.dta")# Read a Stata file  result = pd.read_stata("stata_file.dta")  print("DataFrame read from Stata file:")  print(result)

When we run above program, it produces following result −

DataFrame read from Stata file:

	index	Col_1	Col_2
0	0	0	a
1	1	1	b
2	2	2	c
3	3	3	d
4	4	4	e

If you visit the folder where the Stata dataset files are saved, you can observe the generated.dta file.

Example: Reading Specific Columns from a Stata file

The following example demonstrates how to read specific columns from a Stata file using theread_stata() method with thecolumns parameter.

import pandas as pd# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to a Stata filedf.to_stata("stata_file.dta")# Read specific columns from a Stata file  df = pd.read_stata("stata_file.dta", columns=["Col_2"])  print("Selected columns read from Stata file:")  print(df)

While executing the above code we get the following output −

Selected columns read from Stata file:

	Col_2
0	a
1	b
2	c
3	d
4	e

Example: Setting a Custom Index Column While Reading a Stata File

The following example demonstrates how to use theread_stata() method for setting a custom index from the Stata file column data using theindex_col parameter.

import pandas as pdfrom datetime import datetime# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to Stata with custom gzip compressiondf.to_stata("stata_file.dta")# Read a Stata file by specifying the column to set it as DataFrame Indexdf = pd.read_stata("stata_file.dta", index_col="Col_2")  print("DataFrame read from Stata file with custom index:")  print(df)

Following is an output of the above code −

DataFrame read from Stata file with custom index:

	index	Col_1
Col_2
a	0	0
b	1	1
c	2	2
d	3	3
e	4	4

Example: Reading a Compressed Stata File

Theread_stata() method can also accepts reading a compressed Stata file.

import pandas as pdfrom datetime import datetime# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to Stata with custom gzip compressiondf.to_stata("compressed_file.dta.gz", compression={'method': 'gzip', 'compresslevel': 2})# Read a compressed Stata file  df = pd.read_stata("compressed_file.dta.gz", compression="gzip")  print("DataFrame read from compressed Stata file:")  print(df)