pyarrow.Schema#
- classpyarrow.Schema#
Bases:
_Weakrefable
A named collection of types a.k.a schema. A schema defines thecolumn names and types in a record batch or table data structure.They also contain metadata about the columns. For example, schemasconverted from Pandas contain metadata about their original Pandastypes so they can be converted back to the same types.
Warning
Do not call this class’s constructor directly. Instead use
pyarrow.schema()
factory function which makes a new ArrowSchema object.Examples
Create a new Arrow Schema object:
>>>importpyarrowaspa>>>pa.schema([...('some_int',pa.int32()),...('some_string',pa.string())...])some_int: int32some_string: string
Create Arrow Schema with metadata:
>>>pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})n_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'
- __init__(*args,**kwargs)#
Methods
__init__
(*args, **kwargs)add_metadata
(self, metadata)DEPRECATED
append
(self, Field field)Append a field at the end of the schema.
empty_table
(self)Provide an empty table according to the schema.
equals
(self, Schema other, ...)Test if this schema is equal to the other
field
(self, i)Select a field by its column name or numeric index.
field_by_name
(self, name)DEPRECATED
from_pandas
(cls, df[, preserve_index])Returns implied schema from dataframe
get_all_field_indices
(self, name)Return sorted list of indices for the fields with the given name.
get_field_index
(self, name)Return index of the unique field with the given name.
insert
(self, int i, Field field)Add a field at position i to the schema.
remove
(self, int i)Remove the field at index i from the schema.
remove_metadata
(self)Create new schema without metadata, if any
serialize
(self[, memory_pool])Write Schema to Buffer as encapsulated IPC message
set
(self, int i, Field field)Replace a field at position i in the schema.
to_string
(self[, truncate_metadata, ...])Return human-readable representation of Schema
with_metadata
(self, metadata)Add metadata as dict of string keys and values to Schema
Attributes
The schema's metadata (if any is set).
The schema's field names.
Return deserialized-from-JSON pandas metadata field (if it exists)
The schema's field types.
- add_metadata(self,metadata)#
DEPRECATED
- Parameters:
- metadata
dict
Keys and values must be string-like / coercible to bytes
- metadata
- append(self,Fieldfield)#
Append a field at the end of the schema.
In contrast to Python’s
list.append()
it does return a newobject, leaving the original Schema unmodified.Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Append a field ‘extra’ at the end of the schema:
>>>schema_new=schema.append(pa.field('extra',pa.bool_()))>>>schema_newn_legs: int64animals: stringextra: bool
Original schema is unmodified:
>>>scheman_legs: int64animals: string
- empty_table(self)#
Provide an empty table according to the schema.
- Returns:
- table:
pyarrow.Table
- table:
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Create an empty table with schema’s fields:
>>>schema.empty_table()pyarrow.Tablen_legs: int64animals: string----n_legs: [[]]animals: [[]]
- equals(self,Schemaother,boolcheck_metadata=False)#
Test if this schema is equal to the other
- Parameters:
- other
pyarrow.Schema
- check_metadatabool, default
False
Key/value metadata must be equal too
- other
- Returns:
- is_equalbool
Examples
>>>importpyarrowaspa>>>schema1=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>schema2=pa.schema([...('some_int',pa.int32()),...('some_string',pa.string())...])
Test two equal schemas:
>>>schema1.equals(schema1)True
Test two unequal schemas:
>>>schema1.equals(schema2)False
- field(self,i)#
Select a field by its column name or numeric index.
- Parameters:
- Returns:
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Select the second field:
>>>schema.field(1)pyarrow.Field<animals: string>
Select the field of the column named ‘n_legs’:
>>>schema.field('n_legs')pyarrow.Field<n_legs: int64>
- field_by_name(self,name)#
DEPRECATED
- Parameters:
- name
str
- name
- Returns:
- field:
pyarrow.Field
- field:
- classmethodfrom_pandas(cls,df,preserve_index=None)#
Returns implied schema from dataframe
- Parameters:
- df
pandas.DataFrame
- preserve_indexbool, default
True
Whether to store the index as an additional column (or columns, forMultiIndex) in the resultingTable.The default of None will store the index as a column, except forRangeIndex which is stored as metadata only. Use
preserve_index=True
to force it to be stored as a column.
- df
- Returns:
Examples
>>>importpandasaspd>>>importpyarrowaspa>>>df=pd.DataFrame({...'int':[1,2],...'str':['a','b']...})
Create an Arrow Schema from the schema of a pandas dataframe:
>>>pa.Schema.from_pandas(df)int: int64str: string-- schema metadata --pandas: '{"index_columns": [{"kind": "range", "name": null, ...
- get_all_field_indices(self,name)#
Return sorted list of indices for the fields with the given name.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string()),...pa.field('animals',pa.bool_())])
Get the indexes of the fields named ‘animals’:
>>>schema.get_all_field_indices("animals")[1, 2]
- get_field_index(self,name)#
Return index of the unique field with the given name.
- Parameters:
- name
str
The name of the field to look up.
- name
- Returns:
- index
int
The index of the field with the given name; -1 if thename isn’t found or there are several fields with the givenname.
- index
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Get the index of the field named ‘animals’:
>>>schema.get_field_index("animals")1
Index in case of several fields with the given name:
>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string()),...pa.field('animals',pa.bool_())],...metadata={"n_legs":"Number of legs per animal"})>>>schema.get_field_index("animals")-1
- insert(self,inti,Fieldfield)#
Add a field at position i to the schema.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Insert a new field on the second position:
>>>schema.insert(1,pa.field('extra',pa.bool_()))n_legs: int64extra: boolanimals: string
- metadata#
The schema’s metadata (if any is set).
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})
Get the metadata of the schema’s fields:
>>>schema.metadata{b'n_legs': b'Number of legs per animal'}
- names#
The schema’s field names.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Get the names of the schema’s fields:
>>>schema.names['n_legs', 'animals']
- pandas_metadata#
Return deserialized-from-JSON pandas metadata field (if it exists)
Examples
>>>importpyarrowaspa>>>importpandasaspd>>>df=pd.DataFrame({'n_legs':[2,4,5,100],...'animals':["Flamingo","Horse","Brittle stars","Centipede"]})>>>schema=pa.Table.from_pandas(df).schema
Select pandas metadata field from Arrow Schema:
>>>schema.pandas_metadata{'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ...
- remove(self,inti)#
Remove the field at index i from the schema.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Remove the second field of the schema:
>>>schema.remove(1)n_legs: int64
- remove_metadata(self)#
Create new schema without metadata, if any
- Returns:
- schema
pyarrow.Schema
- schema
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())],...metadata={"n_legs":"Number of legs per animal"})>>>scheman_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'
Create a new schema with removing the metadata from the original:
>>>schema.remove_metadata()n_legs: int64animals: string
- serialize(self,memory_pool=None)#
Write Schema to Buffer as encapsulated IPC message
- Parameters:
- memory_pool
MemoryPool
, defaultNone
Uses default memory pool if not specified
- memory_pool
- Returns:
- serialized
Buffer
- serialized
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Write schema to Buffer:
>>>schema.serialize()<pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
- set(self,inti,Fieldfield)#
Replace a field at position i in the schema.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Replace the second field of the schema with a new field ‘extra’:
>>>schema.set(1,pa.field('replaced',pa.bool_()))n_legs: int64replaced: bool
- to_string(self,truncate_metadata=True,show_field_metadata=True,show_schema_metadata=True)#
Return human-readable representation of Schema
- types#
The schema’s field types.
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Get the types of the schema’s fields:
>>>schema.types[DataType(int64), DataType(string)]
- with_metadata(self,metadata)#
Add metadata as dict of string keys and values to Schema
- Parameters:
- metadata
dict
Keys and values must be string-like / coercible to bytes
- metadata
- Returns:
- schema
pyarrow.Schema
- schema
Examples
>>>importpyarrowaspa>>>schema=pa.schema([...pa.field('n_legs',pa.int64()),...pa.field('animals',pa.string())])
Add metadata to existing schema field:
>>>schema.with_metadata({"n_legs":"Number of legs per animal"})n_legs: int64animals: string-- schema metadata --n_legs: 'Number of legs per animal'