pyarrow.ExtensionType#

classpyarrow.ExtensionType(DataTypestorage_type,extension_name)#

Bases:BaseExtensionType

Concrete base class for Python-defined extension types.

Parameters:
storage_typeDataType

The underlying storage type for the extension type.

extension_namestr

A unique name distinguishing this extension type. The name will beused when deserializing IPC data.

Examples

Define a RationalType extension type subclassing ExtensionType:

>>>importpyarrowaspa>>>classRationalType(pa.ExtensionType):...def__init__(self,data_type:pa.DataType):...ifnotpa.types.is_integer(data_type):...raiseTypeError(f"data_type must be an integer type not{data_type}")...super().__init__(...pa.struct(...[...("numer",data_type),...("denom",data_type),...],...),...# N.B. This name does _not_ reference `data_type` so deserialization...# will work for _any_ integer `data_type` after registration..."my_package.rational",...)...def__arrow_ext_serialize__(self)->bytes:...# No parameters are necessary...returnb""...@classmethod...def__arrow_ext_deserialize__(cls,storage_type,serialized):...# return an instance of this subclass...returnRationalType(storage_type[0].type)

Register the extension type:

>>>pa.register_extension_type(RationalType(pa.int64()))

Create an instance of RationalType extension type:

>>>rational_type=RationalType(pa.int32())

Inspect the extension type:

>>>rational_type.extension_name'my_package.rational'>>>rational_type.storage_typeStructType(struct<numer: int32, denom: int32>)

Wrap an array as an extension array:

>>>storage_array=pa.array(...[...{"numer":10,"denom":17},...{"numer":20,"denom":13},...],...type=rational_type.storage_type...)>>>rational_array=rational_type.wrap_array(storage_array)>>>rational_array<pyarrow.lib.ExtensionArray object at ...>-- is_valid: all not null-- child 0 type: int32  [    10,    20  ]-- child 1 type: int32  [    17,    13  ]

Or do the same with creating an ExtensionArray:

>>>rational_array=pa.ExtensionArray.from_storage(rational_type,storage_array)>>>rational_array<pyarrow.lib.ExtensionArray object at ...>-- is_valid: all not null-- child 0 type: int32  [    10,    20  ]-- child 1 type: int32  [    17,    13  ]

Unregister the extension type:

>>>pa.unregister_extension_type("my_package.rational")

Note that even though we registered the concrete typeRationalType(pa.int64()), PyArrow will be able to deserializeRationalType(integer_type) for anyinteger_type, as the deserializerwill reference the namemy_package.rational and the@classmethod__arrow_ext_deserialize__.

__init__()#

Initialize an extension type instance.

This should be called at the end of the subclass’__init__ method.

Methods

__init__

Initialize an extension type instance.

equals(self, other, *[, check_metadata])

Return true if type is equivalent to passed value.

field(self, i)

to_pandas_dtype(self)

Return the equivalent NumPy / Pandas dtype.

wrap_array(self, storage)

Wrap the given storage array as an extension array.

Attributes

bit_width

The bit width of the extension type.

byte_width

The byte width of the extension type.

extension_name

The extension type name.

has_variadic_buffers

If True, the number of expected buffers is only lower-bounded by num_buffers.

id

num_buffers

Number of data buffers required to construct Array type excluding children.

num_fields

The number of child fields.

storage_type

The underlying storage type.

bit_width#

The bit width of the extension type.

byte_width#

The byte width of the extension type.

equals(self,other,*,check_metadata=False)#

Return true if type is equivalent to passed value.

Parameters:
otherDataType orstrconvertible toDataType
check_metadatabool

Whether nested Field metadata equality should be checked as well.

Returns:
is_equalbool

Examples

>>>importpyarrowaspa>>>pa.int64().equals(pa.string())False>>>pa.int64().equals(pa.int64())True
extension_name#

The extension type name.

field(self,i)Field#
Parameters:
iint
Returns:
pyarrow.Field
has_variadic_buffers#

If True, the number of expected buffers is onlylower-bounded by num_buffers.

Examples

>>>importpyarrowaspa>>>pa.int64().has_variadic_buffersFalse>>>pa.string_view().has_variadic_buffersTrue
id#
num_buffers#

Number of data buffers required to construct Array typeexcluding children.

Examples

>>>importpyarrowaspa>>>pa.int64().num_buffers2>>>pa.string().num_buffers3
num_fields#

The number of child fields.

Examples

>>>importpyarrowaspa>>>pa.int64()DataType(int64)>>>pa.int64().num_fields0>>>pa.list_(pa.string())ListType(list<item: string>)>>>pa.list_(pa.string()).num_fields1>>>struct=pa.struct({'x':pa.int32(),'y':pa.string()})>>>struct.num_fields2
storage_type#

The underlying storage type.

to_pandas_dtype(self)#

Return the equivalent NumPy / Pandas dtype.

Examples

>>>importpyarrowaspa>>>pa.int64().to_pandas_dtype()<class 'numpy.int64'>
wrap_array(self,storage)#

Wrap the given storage array as an extension array.

Parameters:
storageArray orChunkedArray
Returns:
arrayArray orChunkedArray

Extension array wrapping the storage array