pyarrow.csv.ParseOptions#

classpyarrow.csv.ParseOptions(delimiter=None,*,quote_char=None,double_quote=None,escape_char=None,newlines_in_values=None,ignore_empty_lines=None,invalid_row_handler=None)#

Bases:_Weakrefable

Options for parsing CSV files.

Parameters:
delimiter1-characterstr, optional (default ‘,’)

The character delimiting individual cells in the CSV data.

quote_char1-characterstr orFalse, optional (default ‘”’)

The character used optionally for quoting CSV values(False if quoting is not allowed).

double_quotebool, optional (defaultTrue)

Whether two quotes in a quoted CSV value denote a single quotein the data.

escape_char1-characterstr orFalse, optional (defaultFalse)

The character used optionally for escaping special characters(False if escaping is not allowed).

newlines_in_valuesbool, optional (defaultFalse)

Whether newline characters are allowed in CSV values.Setting this to True reduces the performance of multi-threadedCSV reading.

ignore_empty_linesbool, optional (defaultTrue)

Whether empty lines are ignored in CSV input.If False, an empty line is interpreted as containing a single emptyvalue (assuming a one-column CSV file).

invalid_row_handlercallable(), optional (defaultNone)

If not None, this object is called for each CSV row that failsparsing (because of a mismatching number of columns).It should accept a single InvalidRow argument and return either“skip” or “error” depending on the desired outcome.

Examples

Defining an example file from bytes object:

>>>importio>>>s=(..."animals;n_legs;entry\n"..."Flamingo;2;2022-03-01\n"..."# Comment here:\n"..."Horse;4;2022-03-02\n"..."Brittle stars;5;2022-03-03\n"..."Centipede;100;2022-03-04"...)>>>print(s)animals;n_legs;entryFlamingo;2;2022-03-01# Comment here:Horse;4;2022-03-02Brittle stars;5;2022-03-03Centipede;100;2022-03-04>>>source=io.BytesIO(s.encode())

Read the data from a file skipping rows with commentsand defining the delimiter:

>>>frompyarrowimportcsv>>>defskip_comment(row):...ifrow.text.startswith("# "):...return'skip'...else:...return'error'...>>>parse_options=csv.ParseOptions(delimiter=";",invalid_row_handler=skip_comment)>>>csv.read_csv(source,parse_options=parse_options)pyarrow.Tableanimals: stringn_legs: int64entry: date32[day]----animals: [["Flamingo","Horse","Brittle stars","Centipede"]]n_legs: [[2,4,5,100]]entry: [[2022-03-01,2022-03-02,2022-03-03,2022-03-04]]
__init__(*args,**kwargs)#

Methods

__init__(*args, **kwargs)

equals(self, ParseOptions other)

validate(self)

Attributes

delimiter

The character delimiting individual cells in the CSV data.

double_quote

Whether two quotes in a quoted CSV value denote a single quote in the data.

escape_char

The character used optionally for escaping special characters (False if escaping is not allowed).

ignore_empty_lines

Whether empty lines are ignored in CSV input.

invalid_row_handler

Optional handler for invalid rows.

newlines_in_values

Whether newline characters are allowed in CSV values.

quote_char

The character used optionally for quoting CSV values (False if quoting is not allowed).

delimiter#

The character delimiting individual cells in the CSV data.

double_quote#

Whether two quotes in a quoted CSV value denote a single quotein the data.

equals(self,ParseOptionsother)#
Parameters:
otherpyarrow.csv.ParseOptions
Returns:
bool
escape_char#

The character used optionally for escaping special characters(False if escaping is not allowed).

ignore_empty_lines#

Whether empty lines are ignored in CSV input.If False, an empty line is interpreted as containing a single emptyvalue (assuming a one-column CSV file).

invalid_row_handler#

Optional handler for invalid rows.

If not None, this object is called for each CSV row that failsparsing (because of a mismatching number of columns).It should accept a single InvalidRow argument and return either“skip” or “error” depending on the desired outcome.

newlines_in_values#

Whether newline characters are allowed in CSV values.Setting this to True reduces the performance of multi-threadedCSV reading.

quote_char#

The character used optionally for quoting CSV values(False if quoting is not allowed).

validate(self)#