pandas.read_stata #

pandas.read_stata(filepath_or_buffer,*,convert_dates=True,convert_categoricals=True,index_col=None,convert_missing=False,preserve_dtypes=True,columns=None,order_categoricals=True,chunksize=None,iterator=False,compression='infer',storage_options=None)[source]#

Read Stata file into DataFrame.

Parameters:

filepath_or_bufferstr, path object or file-like object

Any valid string path is acceptable. The string could be a URL. ValidURL schemes include http, ftp, s3, and file. For file URLs, a host isexpected. A local file could be:file://localhost/path/to/table.dta.

If you want to pass in a path object, pandas accepts anyos.PathLike.

By file-like object, we refer to objects with aread() method,such as a file handle (e.g. via builtinopen function)orStringIO.

convert_datesbool, default True

Convert date variables to DataFrame time values.

convert_categoricalsbool, default True

Read value labels and convert columns to Categorical/Factor variables.

index_colstr, optional

Column to set as index.

convert_missingbool, default False

Flag indicating whether to convert missing values to their Statarepresentations. If False, missing values are replaced with nan.If True, columns containing missing values are returned withobject data types and missing values are represented byStataMissingValue objects.

preserve_dtypesbool, default True

Preserve Stata datatypes. If False, numeric data are upcast to pandasdefault types for foreign data (float64 or int64).

columnslist or None

Columns to retain. Columns will be returned in the given order. Nonereturns all columns.

order_categoricalsbool, default True

Flag indicating whether converted categorical data are ordered.

chunksizeint, default None

Return StataReader object for iterations, returns chunks withgiven number of lines.

iteratorbool, default False

Return StataReader object.

compressionstr or dict, default ‘infer’

For on-the-fly decompression of on-disk data. If ‘infer’ and ‘filepath_or_buffer’ ispath-like, then detect compression from the following extensions: ‘.gz’,‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’(otherwise no compression).If using ‘zip’ or ‘tar’, the ZIP file must contain only one data file to be read in.Set toNone for no decompression.Can also be a dict with key'method' setto one of {'zip','gzip','bz2','zstd','xz','tar'} andother key-value pairs are forwarded tozipfile.ZipFile,gzip.GzipFile,bz2.BZ2File,zstandard.ZstdDecompressor,lzma.LZMAFile ortarfile.TarFile, respectively.As an example, the following could be passed for Zstandard decompression using acustom compression dictionary:compression={'method':'zstd','dict_data':my_compression_dict}.

Added in version 1.5.0:Added support for.tar files.

storage_optionsdict, optional

Extra options that make sense for a particular storage connection, e.g.host, port, username, password, etc. For HTTP(S) URLs the key-value pairsare forwarded tourllib.request.Request as header options. For otherURLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs areforwarded tofsspec.open. Please seefsspec andurllib for moredetails, and for more examples on storage options referhere.

Returns:

DataFrame or pandas.api.typing.StataReader

See also

io.stata.StataReader: Low-level reader for Stata data files.
DataFrame.to_stata: Export Stata data files.

Notes

Categorical variables read through an iterator may not have the samecategories and dtype. This occurs when a variable stored in a DTAfile is associated to an incomplete set of value labels that onlylabel a strict subset of the values.

Examples

Creating a dummy stata for this example

>>>df=pd.DataFrame({'animal':['falcon','parrot','falcon','parrot'],...'speed':[350,18,361,15]})>>>df.to_stata('animals.dta')

Read a Stata dta file:

>>>df=pd.read_stata('animals.dta')

Read a Stata dta file in 10,000 line chunks:

>>>values=np.random.randint(0,10,size=(20_000,1),dtype="uint8")>>>df=pd.DataFrame(values,columns=["i"])>>>df.to_stata('filename.dta')

>>>withpd.read_stata('filename.dta',chunksize=10000)asitr:>>>forchunkinitr:...# Operate on a single chunk, e.g., chunk.mean()...pass

On this page

Show Source

Movatterモバイル変換

pandas.read_stata#

pandas.read_stata #