pyarrow.ExtensionType#
- classpyarrow.ExtensionType(DataTypestorage_type,extension_name)#
Bases:
BaseExtensionTypeConcrete base class for Python-defined extension types.
- Parameters:
Examples
Define a RationalType extension type subclassing ExtensionType:
>>>importpyarrowaspa>>>classRationalType(pa.ExtensionType):...def__init__(self,data_type:pa.DataType):...ifnotpa.types.is_integer(data_type):...raiseTypeError(f"data_type must be an integer type not{data_type}")...super().__init__(...pa.struct(...[...("numer",data_type),...("denom",data_type),...],...),...# N.B. This name does _not_ reference `data_type` so deserialization...# will work for _any_ integer `data_type` after registration..."my_package.rational",...)...def__arrow_ext_serialize__(self)->bytes:...# No parameters are necessary...returnb""...@classmethod...def__arrow_ext_deserialize__(cls,storage_type,serialized):...# return an instance of this subclass...returnRationalType(storage_type[0].type)
Register the extension type:
>>>pa.register_extension_type(RationalType(pa.int64()))
Create an instance of RationalType extension type:
>>>rational_type=RationalType(pa.int32())
Inspect the extension type:
>>>rational_type.extension_name'my_package.rational'>>>rational_type.storage_typeStructType(struct<numer: int32, denom: int32>)
Wrap an array as an extension array:
>>>storage_array=pa.array(...[...{"numer":10,"denom":17},...{"numer":20,"denom":13},...],...type=rational_type.storage_type...)>>>rational_array=rational_type.wrap_array(storage_array)>>>rational_array<pyarrow.lib.ExtensionArray object at ...>-- is_valid: all not null-- child 0 type: int32 [ 10, 20 ]-- child 1 type: int32 [ 17, 13 ]
Or do the same with creating an ExtensionArray:
>>>rational_array=pa.ExtensionArray.from_storage(rational_type,storage_array)>>>rational_array<pyarrow.lib.ExtensionArray object at ...>-- is_valid: all not null-- child 0 type: int32 [ 10, 20 ]-- child 1 type: int32 [ 17, 13 ]
Unregister the extension type:
>>>pa.unregister_extension_type("my_package.rational")
Note that even though we registered the concrete type
RationalType(pa.int64()), PyArrow will be able to deserializeRationalType(integer_type)for anyinteger_type, as the deserializerwill reference the namemy_package.rationaland the@classmethod__arrow_ext_deserialize__.- __init__()#
Initialize an extension type instance.
This should be called at the end of the subclass’
__init__method.
Methods
Initialize an extension type instance.
equals(self, other, *[, check_metadata])Return true if type is equivalent to passed value.
field(self, i)to_pandas_dtype(self)Return the equivalent NumPy / Pandas dtype.
wrap_array(self, storage)Wrap the given storage array as an extension array.
Attributes
The bit width of the extension type.
The byte width of the extension type.
The extension type name.
If True, the number of expected buffers is only lower-bounded by num_buffers.
Number of data buffers required to construct Array type excluding children.
The number of child fields.
The underlying storage type.
- bit_width#
The bit width of the extension type.
- byte_width#
The byte width of the extension type.
- equals(self,other,*,check_metadata=False)#
Return true if type is equivalent to passed value.
- Parameters:
- Returns:
- is_equalbool
Examples
>>>importpyarrowaspa>>>pa.int64().equals(pa.string())False>>>pa.int64().equals(pa.int64())True
- extension_name#
The extension type name.
- has_variadic_buffers#
If True, the number of expected buffers is onlylower-bounded by num_buffers.
Examples
>>>importpyarrowaspa>>>pa.int64().has_variadic_buffersFalse>>>pa.string_view().has_variadic_buffersTrue
- id#
- num_buffers#
Number of data buffers required to construct Array typeexcluding children.
Examples
>>>importpyarrowaspa>>>pa.int64().num_buffers2>>>pa.string().num_buffers3
- num_fields#
The number of child fields.
Examples
>>>importpyarrowaspa>>>pa.int64()DataType(int64)>>>pa.int64().num_fields0>>>pa.list_(pa.string())ListType(list<item: string>)>>>pa.list_(pa.string()).num_fields1>>>struct=pa.struct({'x':pa.int32(),'y':pa.string()})>>>struct.num_fields2
- storage_type#
The underlying storage type.
- to_pandas_dtype(self)#
Return the equivalent NumPy / Pandas dtype.
Examples
>>>importpyarrowaspa>>>pa.int64().to_pandas_dtype()<class 'numpy.int64'>
- wrap_array(self,storage)#
Wrap the given storage array as an extension array.
- Parameters:
- storage
ArrayorChunkedArray
- storage
- Returns:
- array
ArrayorChunkedArray Extension array wrapping the storage array
- array

