NpyString API#

New in version 2.0.

This API allows access to the UTF-8 string data stored in NumPy StringDTypearrays. SeeNEP-55 formore in-depth details into the design of StringDType.

Examples#

Loading a String#

Say we are writing a ufunc implementation forStringDType. If we are givenconstchar*buf pointer to the beginning of aStringDType array entry, and aPyArray_Descr* pointer to the array descriptor, one canaccess the underlying string data like so:

npy_string_allocator*allocator=NpyString_acquire_allocator((PyArray_StringDTypeObject*)descr);npy_static_stringsdata={0,NULL};npy_packed_static_string*packed_string=(npy_packed_static_string*)buf;intis_null=0;is_null=NpyString_load(allocator,packed_string,&sdata);if(is_null==-1){// failed to load string, set errorreturn-1;}elseif(is_null){// handle missing string// sdata->buf is NULL// sdata->size is 0}else{// sdata->buf is a pointer to the beginning of a string// sdata->size is the size of the string}NpyString_release_allocator(allocator);

Packing a String#

This example shows how to pack a new string entry into an array:

char*str="Hello world";size_tsize=11;npy_packed_static_string*packed_string=(npy_packed_static_string*)buf;npy_string_allocator*allocator=NpyString_acquire_allocator((PyArray_StringDTypeObject*)descr);// copy contents of str into packed_stringif(NpyString_pack(allocator,packed_string,str,size)==-1){// string packing failed, set errorreturn-1;}// packed_string contains a copy of "Hello world"NpyString_release_allocator(allocator);

Types#

typenpy_packed_static_string#

An opaque struct that represents “packed” encoded strings. Individualentries in array buffers are instances of this struct. Direct accessto the data in the struct is undefined and future version of the library maychange the packed representation of strings.

typenpy_static_string#

An unpacked string allowing access to the UTF-8 string data.

typedefstructnpy_unpacked_static_string{size_tsize;constchar*buf;}npy_static_string;
size_tsize#

The size of the string, in bytes.

constchar*buf#

The string buffer. Holds UTF-8-encoded bytes. Does not currently end ina null string but we may decide to add null termination in thefuture, so do not rely on the presence or absence of null-termination.

Note that this is aconst buffer. If you want to alter anentry in an array, you should create a new string and pack itinto the array entry.

typenpy_string_allocator#

An opaque pointer to an object that handles string allocation.Before using the allocator, you must acquire the allocator lock and releasethe lock after you are done interacting with strings managed by theallocator.

typePyArray_StringDTypeObject#

The C struct backing instances of StringDType in Python. Attributes storethe settings the object was created with, an instance ofnpy_string_allocator that manages string allocations for arraysassociated with the DType instance, and several attributes cachinginformation about the missing string object that is commonly needed in castand ufunc loop implementations.

typedefstruct{PyArray_Descrbase;PyObject*na_object;charcoerce;charhas_nan_na;charhas_string_na;chararray_owned;npy_static_stringdefault_string;npy_static_stringna_name;npy_string_allocator*allocator;}PyArray_StringDTypeObject;
PyArray_Descrbase#

The base object. Use this member to access fields common to alldescriptor objects.

PyObject*na_object#

A reference to the object representing the null value. If there is nonull value (the default) this will be NULL.

charcoerce#

1 if string coercion is enabled, 0 otherwise.

charhas_nan_na#

1 if the missing string object (if any) is NaN-like, 0 otherwise.

charhas_string_na#

1 if the missing string object (if any) is a string, 0 otherwise.

chararray_owned#

1 if an array owns the StringDType instance, 0 otherwise.

npy_static_stringdefault_string#

The default string to use in operations. If the missing string objectis a string, this will contain the string data for the missing string.

npy_static_stringna_name#

The name of the missing string object, if any. An empty stringotherwise.

npy_string_allocatorallocator#

The allocator instance associated with the array that owns thisdescriptor instance. The allocator should only be directly accessedafter acquiring the allocator_lock and the lock should be releasedimmediately after the allocator is no longer needed

Functions#

npy_string_allocator*NpyString_acquire_allocator(constPyArray_StringDTypeObject*descr)#

Acquire the mutex locking the allocator attached todescr.NpyString_release_allocator must be called on the allocatorreturned by this function exactly once. Note that functions requiring theGIL should not be called while the allocator mutex is held, as doing so maycause deadlocks.

voidNpyString_acquire_allocators(size_tn_descriptors,PyArray_Descr*constdescrs[],npy_string_allocator*allocators[])#

Simultaneously acquire the mutexes locking the allocators attached tomultiple descriptors. Writes a pointer to the associated allocator in theallocators array for each StringDType descriptor in the array. If any ofthe descriptors are not StringDType instances, write NULL to the allocatorsarray for that entry.

n_descriptors is the number of descriptors in the descrs array thatshould be examined. Any descriptor aftern_descriptors elements isignored. A buffer overflow will happen if thedescrs array does notcontain n_descriptors elements.

If pointers to the same descriptor are passed multiple times, only acquiresthe allocator mutex once but sets identical allocator pointers appropriately.The allocator mutexes must be released after this function returns, seeNpyString_release_allocators.

Note that functions requiring the GIL should not be called while theallocator mutex is held, as doing so may cause deadlocks.

voidNpyString_release_allocator(npy_string_allocator*allocator)#

Release the mutex locking an allocator. This must be called exactly onceafter acquiring the allocator mutex and all operations requiring theallocator are done.

If you need to release multiple allocators, seeNpyString_release_allocators, which can correctly handle releasing theallocator once when given several references to the same allocator.

voidNpyString_release_allocators(size_tlength,npy_string_allocator*allocators[])#

Release the mutexes locking N allocators.length is the length of theallocators array. NULL entries are ignored.

If pointers to the same allocator are passed multiple times, only releasesthe allocator mutex once.

intNpyString_load(npy_string_allocator*allocator,constnpy_packed_static_string*packed_string,npy_static_string*unpacked_string)#

Extract the packed contents ofpacked_string intounpacked_string.

Theunpacked_string is a read-only view onto thepacked_string dataand should not be used to modify the string data. Ifpacked_string isthe null string, setsunpacked_string.buf to the NULLpointer. Returns -1 if unpacking the string fails, returns 1 ifpacked_string is the null string, and returns 0 otherwise.

A useful pattern is to define a stack-allocated npy_static_string instanceinitialized to{0,NULL} and pass a pointer to the stack-allocatedunpacked string to this function. This function can be used tosimultaneously unpack a string and determine if it is a null string.

intNpyString_pack_null(npy_string_allocator*allocator,npy_packed_static_string*packed_string)#

Pack the null string intopacked_string. Returns 0 on success and -1 onfailure.

intNpyString_pack(npy_string_allocator*allocator,npy_packed_static_string*packed_string,constchar*buf,size_tsize)#

Copy and pack the firstsize entries of the buffer pointed to bybufinto thepacked_string. Returns 0 on success and -1 on failure.