Movatterモバイル変換


[0]ホーム

URL:


Python Pandas Tutorial

Python Pandas read_stata() Method



Theread_stata() method in Python's Pandas library is used to read or load data from a Stata dataset file into a Pandas DataFrame. In other words, this method allows you to import data from Stata's.dta files into a Pandas DataFrame, enabling easy data manipulation and analysis in Python.Stata is a software tool widely used for statistical analysis, and its dataset files are a common format for storing structured data, which is developed byStataCorp.

Thisread_stata() method supports features like automatic handling of Stata-specific data types, optional column selection, and chunk-based reading for large datasets. It allows users to convert categorical variables, handle missing values, and preserve data types.

Syntax

Below is the syntax of the Python Pandas read_stata() method −

pandas.read_stata(filepath_or_buffer, *, convert_dates=True, convert_categoricals=True, index_col=None, convert_missing=False, preserve_dtypes=True, columns=None, order_categoricals=True, chunksize=None, iterator=False, compression='infer', storage_options=None)

Parameters

The Python Pandas read_stata() method accepts the below parameters −

  • filepath_or_buffer: A string, path object, or file-like object representing the location of the Stata dataset file to read.

  • convert_dates: A boolean indicating whether to convert date variables to Pandas datetime values. By default it is set toTrue.

  • convert_categoricals: A boolean indicating whether to read value labels and convert columns to Categorical/Factor variables. By default it is set toTrue.

  • index_col: Specifies the column to use as the DataFrame index. If None, no column is used as the index.

  • convert_missing: A boolean indicating whether to convert missing values to their Stata representations. If set toTrue, columns containing missing values are returned with object data types and missing values are represented byStataMissingValue objects. If set toFalse, missing values are replaced withnan.

  • preserve_dtypes: IfTrue, preserves the original data types of variables in the Stata file. IfFalse, numeric data are directed to pandas default types for foreign data (float64 or int64).

  • columns: Specifies a subset of columns to include in the output. By default, it includes all columns.

  • order_categoricals: Determines whether the converted categorical data are ordered.

  • chunksize: Read Stata data in chunks of specified size.

  • iterator: Returns the StataReader object.

  • compression: Specifies the compression method to use. If set to 'infer', the method will automatically detect the compression type based on the file extension (e.g., .gz, .bz2, .zip, .xz, .zst, .tar, .tar.gz, or .tar.bz2).

  • storage_options: Additional options for connecting to certain storage back-ends (e.g., AWS S3, Google Cloud Storage).

Return Value

The Pandasread_stata() method returns aDataFrame containing the data read from the specified Stata file orpandas.api.typing.StataReader object.

Example: Basic Reading of a Stata Dataset File

Here is a basic example demonstrating reading a Stata dataset file into a Pandas DataFrame using theread_stata() method.

import pandas as pd# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to a Stata filedf.to_stata("stata_file.dta")# Read a Stata file  result = pd.read_stata("stata_file.dta")  print("DataFrame read from Stata file:")  print(result)

When we run above program, it produces following result −

DataFrame read from Stata file:
indexCol_1Col_2
000a
111b
222c
333d
444e
If you visit the folder where the Stata dataset files are saved, you can observe the generated.dta file.

Example: Reading Specific Columns from a Stata file

The following example demonstrates how to read specific columns from a Stata file using theread_stata() method with thecolumns parameter.

import pandas as pd# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to a Stata filedf.to_stata("stata_file.dta")# Read specific columns from a Stata file  df = pd.read_stata("stata_file.dta", columns=["Col_2"])  print("Selected columns read from Stata file:")  print(df)

While executing the above code we get the following output −

Selected columns read from Stata file:
Col_2
0a
1b
2c
3d
4e

Example: Setting a Custom Index Column While Reading a Stata File

The following example demonstrates how to use theread_stata() method for setting a custom index from the Stata file column data using theindex_col parameter.

import pandas as pdfrom datetime import datetime# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to Stata with custom gzip compressiondf.to_stata("stata_file.dta")# Read a Stata file by specifying the column to set it as DataFrame Indexdf = pd.read_stata("stata_file.dta", index_col="Col_2")  print("DataFrame read from Stata file with custom index:")  print(df)

Following is an output of the above code −

DataFrame read from Stata file with custom index:
indexCol_1
Col_2
a00
b11
c22
d33
e44

Example: Reading a Compressed Stata File

Theread_stata() method can also accepts reading a compressed Stata file.

import pandas as pdfrom datetime import datetime# Create a DataFramedf = pd.DataFrame({"Col_1": range(5), "Col_2": ['a', 'b', 'c', 'd', 'e']})# Save the DataFrame to Stata with custom gzip compressiondf.to_stata("compressed_file.dta.gz", compression={'method': 'gzip', 'compresslevel': 2})# Read a compressed Stata file  df = pd.read_stata("compressed_file.dta.gz", compression="gzip")  print("DataFrame read from compressed Stata file:")  print(df)

Following is an output of the above code −

DataFrame read from compressed Stata file:
indexCol_1Col_2
000a
111b
222c
333d
444e
python_pandas_io_tool.htm
Print Page
Advertisements

[8]ページ先頭

©2009-2025 Movatter.jp