pyarrow.Schema #

classpyarrow.Schema#

Bases:_Weakrefable

A named collection of types a.k.a schema. A schema defines thecolumn names and types in a record batch or table data structure.They also contain metadata about the columns. For example, schemasconverted from Pandas contain metadata about their original Pandastypes so they can be converted back to the same types.

Warning

Do not call this class’s constructor directly. Instead usepyarrow.schema() factory function which makes a new ArrowSchema object.

Examples

Create a new Arrow Schema object:

>>>importpyarrowaspa>>>pa.schema([...('some_int',pa.int32()),...('some_string',pa.string())...])some_int: int32some_string: string

Create Arrow Schema with metadata:

>>>pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})n_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'

__init__(*args,**kwargs)#

Methods

`__init__`(args, *kwargs)
`add_metadata`(self, metadata)	DEPRECATED
`append`(self, Field field)	Append a field at the end of the schema.
`empty_table`(self)	Provide an empty table according to the schema.
`equals`(self, Schema other, ...)	Test if this schema is equal to the other
`field`(self, i)	Select a field by its column name or numeric index.
`field_by_name`(self, name)	DEPRECATED
`from_pandas`(cls, df[, preserve_index])	Returns implied schema from dataframe
`get_all_field_indices`(self, name)	Return sorted list of indices for the fields with the given name.
`get_field_index`(self, name)	Return index of the unique field with the given name.
`insert`(self, int i, Field field)	Add a field at position i to the schema.
`remove`(self, int i)	Remove the field at index i from the schema.
`remove_metadata`(self)	Create new schema without metadata, if any
`serialize`(self[, memory_pool])	Write Schema to Buffer as encapsulated IPC message
`set`(self, int i, Field field)	Replace a field at position i in the schema.
`to_string`(self[, truncate_metadata, ...])	Return human-readable representation of Schema
`with_metadata`(self, metadata)	Add metadata as dict of string keys and values to Schema

Attributes

`metadata`	The schema's metadata (if any is set).
`names`	The schema's field names.
`pandas_metadata`	Return deserialized-from-JSON pandas metadata field (if it exists)
`types`	The schema's field types.

add_metadata(self,metadata)#

DEPRECATED

Parameters:

metadatadict: Keys and values must be string-like / coercible to bytes

append(self,Fieldfield)#

Append a field at the end of the schema.

In contrast to Python’slist.append() it does return a newobject, leaving the original Schema unmodified.

Parameters:

fieldField

Returns:

schema:Schema: New object with appended field.

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Append a field ‘extra’ at the end of the schema:

>>>schema_new=schema.append(pa.field('extra',pa.bool_()))>>>schema_newn_legs: int64animals: stringextra: bool

Original schema is unmodified:

>>>scheman_legs: int64animals: string

empty_table(self)#

Provide an empty table according to the schema.

Returns:

table:pyarrow.Table

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Create an empty table with schema’s fields:

>>>schema.empty_table()pyarrow.Tablen_legs: int64animals: string----n_legs: [[]]animals: [[]]

equals(self,Schemaother,boolcheck_metadata=False)#

Test if this schema is equal to the other

Parameters:

otherpyarrow.Schema
check_metadatabool, defaultFalse: Key/value metadata must be equal too

Returns:

is_equalbool

Examples

>>>importpyarrowaspa>>>schema1=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>schema2=pa.schema([...('some_int',pa.int32()),...('some_string',pa.string())...])

Test two equal schemas:

>>>schema1.equals(schema1)True

Test two unequal schemas:

>>>schema1.equals(schema2)False

field(self,i)#

Select a field by its column name or numeric index.

Parameters:

iint orstr

Returns:

pyarrow.Field

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Select the second field:

>>>schema.field(1)pyarrow.Field<animals: string>

Select the field of the column named ‘n_legs’:

>>>schema.field('n_legs')pyarrow.Field<n_legs: int64>

field_by_name(self,name)#

DEPRECATED

Parameters:

namestr

Returns:

field:pyarrow.Field

classmethodfrom_pandas(cls,df,preserve_index=None)#

Returns implied schema from dataframe

Parameters:

dfpandas.DataFrame
preserve_indexbool, defaultTrue: Whether to store the index as an additional column (or columns, forMultiIndex) in the resultingTable.The default of None will store the index as a column, except forRangeIndex which is stored as metadata only. Usepreserve_index=True to force it to be stored as a column.

Returns:

pyarrow.Schema

Examples

>>>importpandasaspd>>>importpyarrowaspa>>>df=pd.DataFrame({...'int':[1,2],...'str':['a','b']...})

Create an Arrow Schema from the schema of a pandas dataframe:

>>>pa.Schema.from_pandas(df)int: int64str: string-- schema metadata --pandas: '{"index_columns": [{"kind": "range", "name": null, ...

get_all_field_indices(self,name)#

Return sorted list of indices for the fields with the given name.

Parameters:

namestr: The name of the field to look up.

Returns:

indicesList[int]

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string()),...pa.field('animals',pa.bool_())])

Get the indexes of the fields named ‘animals’:

>>>schema.get_all_field_indices("animals")[1, 2]

get_field_index(self,name)#

Return index of the unique field with the given name.

Parameters:

namestr: The name of the field to look up.

Returns:

indexint: The index of the field with the given name; -1 if thename isn’t found or there are several fields with the givenname.

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Get the index of the field named ‘animals’:

>>>schema.get_field_index("animals")1

Index in case of several fields with the given name:

>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string()),...pa.field('animals',pa.bool_())],...metadata={"n_legs":"Number of legs per animal"})>>>schema.get_field_index("animals")-1

insert(self,inti,Fieldfield)#

Add a field at position i to the schema.

Parameters:

iint
fieldField

Returns:

schema:Schema

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Insert a new field on the second position:

>>>schema.insert(1,pa.field('extra',pa.bool_()))n_legs: int64extra: boolanimals: string

metadata#

The schema’s metadata (if any is set).

Returns:

metadata:dict orNone

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})

Get the metadata of the schema’s fields:

>>>schema.metadata{b'n_legs': b'Number of legs per animal'}

names#

The schema’s field names.

Returns:

list ofstr

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Get the names of the schema’s fields:

>>>schema.names['n_legs', 'animals']

pandas_metadata#

Return deserialized-from-JSON pandas metadata field (if it exists)

Examples

>>>importpyarrowaspa>>>importpandasaspd>>>df=pd.DataFrame({'n_legs':[2,4,5,100],...'animals':["Flamingo","Horse","Brittle stars","Centipede"]})>>>schema=pa.Table.from_pandas(df).schema

Select pandas metadata field from Arrow Schema:

>>>schema.pandas_metadata{'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ...

remove(self,inti)#

Remove the field at index i from the schema.

Parameters:

iint

Returns:

schema:Schema

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Remove the second field of the schema:

>>>schema.remove(1)n_legs: int64

remove_metadata(self)#

Create new schema without metadata, if any

Returns:

schemapyarrow.Schema

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>scheman_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'

Create a new schema with removing the metadata from the original:

>>>schema.remove_metadata()n_legs: int64animals: string

serialize(self,memory_pool=None)#

Write Schema to Buffer as encapsulated IPC message

Parameters:

memory_poolMemoryPool, defaultNone: Uses default memory pool if not specified

Returns:

serializedBuffer

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Write schema to Buffer:

>>>schema.serialize()<pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>

set(self,inti,Fieldfield)#

Replace a field at position i in the schema.

Parameters:

iint
fieldField

Returns:

schema:Schema

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Replace the second field of the schema with a new field ‘extra’:

>>>schema.set(1,pa.field('replaced',pa.bool_()))n_legs: int64replaced: bool

to_string(self,truncate_metadata=True,show_field_metadata=True,show_schema_metadata=True)#

Return human-readable representation of Schema

Parameters:

truncate_metadatabool, defaultTrue: Limit metadata key/value display to a single line of ~80 charactersor less
show_field_metadatabool, defaultTrue: Display Field-level KeyValueMetadata
show_schema_metadatabool, defaultTrue: Display Schema-level KeyValueMetadata

Returns:

strtheformattedoutput

types#

The schema’s field types.

Returns:

list ofDataType

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Get the types of the schema’s fields:

>>>schema.types[DataType(int64), DataType(string)]

with_metadata(self,metadata)#

Add metadata as dict of string keys and values to Schema

Parameters:

metadatadict: Keys and values must be string-like / coercible to bytes

Returns:

schemapyarrow.Schema

Examples

>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])

Add metadata to existing schema field:

>>>schema.with_metadata({"n_legs":"Number of legs per animal"})n_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'

On this page

Edit on GitHub

Movatterモバイル変換

pyarrow.Schema#

pyarrow.Schema #