pyarrow.csv.ReadOptions #

classpyarrow.csv.ReadOptions(use_threads=None,*,block_size=None,skip_rows=None,skip_rows_after_names=None,column_names=None,autogenerate_column_names=None,encoding='utf8')#

Bases:_Weakrefable

Options for reading CSV files.

Parameters:

use_threadsbool, optional (defaultTrue): Whether to use multiple threads to accelerate reading
block_sizeint, optional: How much bytes to process at a time from the input stream.This will determine multi-threading granularity as well asthe size of individual record batches or table chunks.Minimum valid value for block size is 1
skip_rowsint, optional (default 0): The number of rows to skip before the column names (if any)and the CSV data.
skip_rows_after_namesint, optional (default 0): The number of rows to skip after the column names.This number can be larger than the number of rows in oneblock, and empty rows are counted.The order of application is as follows:-skip_rows is applied (if non-zero);- column names are read (unlesscolumn_names is set);-skip_rows_after_names is applied (if non-zero).
column_nameslist, optional: The column names of the target table. If empty, fall back onautogenerate_column_names.
autogenerate_column_namesbool, optional (defaultFalse): Whether to autogenerate column names ifcolumn_names is empty.If true, column names will be of the form “f0”, “f1”…If false, column names will be read from the first CSV rowafterskip_rows.
encodingstr, optional (default ‘utf8’): The character encoding of the CSV data. Columns that cannotdecode using this encoding can still be read as Binary.

Examples

Defining an example data:

>>>importio>>>s="1,2,3\nFlamingo,2,2022-03-01\nHorse,4,2022-03-02\nBrittle stars,5,2022-03-03\nCentipede,100,2022-03-04">>>print(s)1,2,3Flamingo,2,2022-03-01Horse,4,2022-03-02Brittle stars,5,2022-03-03Centipede,100,2022-03-04

Ignore the first numbered row and substitute it with definedor autogenerated column names:

>>>frompyarrowimportcsv>>>read_options=csv.ReadOptions(...column_names=["animals","n_legs","entry"],...skip_rows=1)>>>csv.read_csv(io.BytesIO(s.encode()),read_options=read_options)pyarrow.Tableanimals: stringn_legs: int64entry: date32[day]----animals: [["Flamingo","Horse","Brittle stars","Centipede"]]n_legs: [[2,4,5,100]]entry: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]

>>>read_options=csv.ReadOptions(autogenerate_column_names=True,...skip_rows=1)>>>csv.read_csv(io.BytesIO(s.encode()),read_options=read_options)pyarrow.Tablef0: stringf1: int64f2: date32[day]----f0: [["Flamingo","Horse","Brittle stars","Centipede"]]f1: [[2,4,5,100]]f2: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]

Remove the first 2 rows of the data:

>>>read_options=csv.ReadOptions(skip_rows_after_names=2)>>>csv.read_csv(io.BytesIO(s.encode()),read_options=read_options)pyarrow.Table1: string2: int643: date32[day]----1: [["Brittle stars","Centipede"]]2: [[5,100]]3: [[2022-03-03,2022-03-04]]

__init__(*args,**kwargs)#

Methods

`__init__`(args, *kwargs)
`equals`(self, ReadOptions other)
`validate`(self)

Attributes

`autogenerate_column_names`	Whether to autogenerate column names ifcolumn_names is empty.
`block_size`	How much bytes to process at a time from the input stream.
`column_names`	The column names of the target table.
`encoding`	encoding: object
`skip_rows`	The number of rows to skip before the column names (if any) and the CSV data.
`skip_rows_after_names`	The number of rows to skip after the column names.
`use_threads`	Whether to use multiple threads to accelerate reading.

autogenerate_column_names#: Whether to autogenerate column names ifcolumn_names is empty.If true, column names will be of the form “f0”, “f1”…If false, column names will be read from the first CSV rowafterskip_rows.

block_size#: How much bytes to process at a time from the input stream.This will determine multi-threading granularity as well asthe size of individual record batches or table chunks.

column_names#: The column names of the target table. If empty, fall back onautogenerate_column_names.

encoding#: encoding: object

equals(self,ReadOptionsother)#

Parameters:

otherpyarrow.csv.ReadOptions

Returns:

bool

skip_rows#: The number of rows to skip before the column names (if any)and the CSV data.Seeskip_rows_after_names for interaction description

skip_rows_after_names#: The number of rows to skip after the column names.This number can be larger than the number of rows in oneblock, and empty rows are counted.The order of application is as follows:-skip_rows is applied (if non-zero);- column names are read (unlesscolumn_names is set);-skip_rows_after_names is applied (if non-zero).

use_threads#: Whether to use multiple threads to accelerate reading.

validate(self)#

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.csv.ReadOptions#

pyarrow.csv.ReadOptions #