PyLongWriter_Create() with gmpy2PyLong_GetNativeLayout() functionvalue field from the export APIImportant
This PEP is a historical document. The up-to-date, canonical documentation can now be found at theExport API and thePyLongWriter API.
×
SeePEP 1 for how to propose changes.
Add a new C API to import and export Python integers,int objects:especiallyPyLongWriter_Create() andPyLong_Export() functions.
Projects such asgmpy2,SAGE andPython-FLINT access directly Python“internals” (thePyLongObject structure) or use an inefficienttemporary format (hex strings for Python-FLINT) to import andexport Pythonint objects. The Pythonint implementationchanged in Python 3.12 to add a tag and “compact values”.
In the 3.13 alpha 1 release, the private undocumented_PyLong_New()function had been removed, but it is being used by these projects toimport Python integers. The private function has been restored in 3.13alpha 2.
A public efficient abstraction is needed to interface Python with theseprojects without exposing implementation details. It would allow Pythonto change its internals without breaking these projects. For example,implementation for gmpy2 was changed recently for CPython 3.9 andfor CPython 3.12.
Data needed byGMP-likeimport-exportfunctions.
UsePyLong_GetNativeLayout() to get the native layout of Pythonint objects, used internally for integers with “big enough”absolute value.
See alsosys.int_info which exposes similar information to Python.
1 for most significant digit first-1 for least significant digit first1 for most significant byte first (big endian)-1 for least significant byte first (little endian)int objects.See thePyLongLayout structure.
The function must not be called before Python initialization nor afterPython finalization. The returned layout is valid until Python isfinalized. The layout is the same for all Python sub-interpreters andso it can be cached.
int object.There are two cases:
digits isNULL, only use thevalue member.digits is notNULL, usenegative,ndigits anddigits members.int object.Only valid ifdigits isNULL.digits is notNULL.digits array.Only valid ifdigits is notNULL.NULL.IfPyLongExport.digits is notNULL, a private field of thePyLongExport structure stores a strong reference to the Pythonint object to make sure that that structure remains valid untilPyLong_FreeExport() is called.
int object.export_long must point to aPyLongExport structure allocatedby the caller. It must not beNULL.
On success, fill in*export_long and return 0.On error, set an exception and return -1.
PyLong_FreeExport() must be called when the export is no longerneeded.
CPython implementation detail: This function always succeeds ifobj isa Pythonint object or a subclass.
On CPython 3.14, no memory copy is needed inPyLong_Export(), it’s justa thin wrapper to expose Pythonint internal digits array.
PyLong_Export().CPython implementation detail: CallingPyLong_FreeExport() isoptional ifexport_long->digits isNULL.
ThePyLongWriter API can be used to import an integer.
int writer instance.The instance must be destroyed byPyLongWriter_Finish() orPyLongWriter_Discard().
PyLongWriter.On success, allocate*digits and return a writer.On error, set an exception and returnNULL.
negative is1 if the number is negative, or0 otherwise.
ndigits is the number of digits in thedigits array. It must begreater than 0.
digits must not be NULL.
After a successful call to this function, the caller should fill in thearray of digitsdigits and then callPyLongWriter_Finish() to geta Pythonint.The layout ofdigits is described byPyLong_GetNativeLayout().
Digits must be in the range [0;(1<<bits_per_digit)-1](where thebits_per_digit is the number of bitsper digit).Any unused most significant digits must be set to0.
Alternately, callPyLongWriter_Discard() to destroy the writerinstance without creating anint object.
On CPython 3.14, thePyLongWriter_Create() implementation is a thinwrapper to the private_PyLong_New() function.
PyLongWriter created byPyLongWriter_Create().On success, return a Pythonint object.On error, set an exception and returnNULL.
The function takes care of normalizing the digits and converts theobject to a compact integer if needed.
The writer instance and thedigits array are invalid after the call.
PyLongWriter created byPyLongWriter_Create().writer must not beNULL.
The writer instance and thedigits array are invalid after the call.
Proposed import API is efficient for large integers. Compared toaccessing directly Python internals, the proposed import API can have asignificant performance overhead on small integers.
For small integers of a few digits (for example, 1 or 2 digits), existing APIscan be used:
Code:
/* Query parameters of Python’s internal representation of integers. */constPyLongLayout*layout=PyLong_GetNativeLayout();size_tint_digit_size=layout->digit_size;intint_digits_order=layout->digits_order;size_tint_bits_per_digit=layout->bits_per_digit;size_tint_nails=int_digit_size*8-int_bits_per_digit;intint_endianness=layout->digit_endianness;
PyLong_Export() with gmpy2Code:
staticintmpz_set_PyLong(mpz_tz,PyObject*obj){staticPyLongExportlong_export;if(PyLong_Export(obj,&long_export)<0){return-1;}if(long_export.digits){mpz_import(z,long_export.ndigits,int_digits_order,int_digit_size,int_endianness,int_nails,long_export.digits);if(long_export.negative){mpz_neg(z,z);}PyLong_FreeExport(&long_export);}else{constint64_tvalue=long_export.value;if(LONG_MIN<=value&&value<=LONG_MAX){mpz_set_si(z,value);}else{mpz_import(z,1,-1,sizeof(int64_t),0,0,&value);if(value<0){mpz_ttmp;mpz_init(tmp);mpz_ui_pow_ui(tmp,2,64);mpz_sub(z,z,tmp);mpz_clear(tmp);}}}return0;}
Reference code:mpz_set_PyLong() in the gmpy2 master for commit 9177648.
Benchmark:
importpyperffromgmpy2importmpzrunner=pyperf.Runner()runner.bench_func('1<<7',mpz,1<<7)runner.bench_func('1<<38',mpz,1<<38)runner.bench_func('1<<300',mpz,1<<300)runner.bench_func('1<<3000',mpz,1<<3000)
Results on Linux Fedora 40 with CPU isolation, Python built in releasemode:
| Benchmark | ref | pep757 |
|---|---|---|
| 1<<7 | 91.3 ns | 89.9 ns: 1.02x faster |
| 1<<38 | 120 ns | 94.9 ns: 1.27x faster |
| 1<<300 | 196 ns | 203 ns: 1.04x slower |
| 1<<3000 | 939 ns | 945 ns: 1.01x slower |
| Geometric mean | (ref) | 1.05x faster |
PyLongWriter_Create() with gmpy2Code:
staticPyObject*GMPy_PyLong_From_MPZ(MPZ_Object*obj,CTXT_Object*context){if(mpz_fits_slong_p(obj->z)){returnPyLong_FromLong(mpz_get_si(obj->z));}size_tsize=(mpz_sizeinbase(obj->z,2)+int_bits_per_digit-1)/int_bits_per_digit;void*digits;PyLongWriter*writer=PyLongWriter_Create(mpz_sgn(obj->z)<0,size,&digits);if(writer==NULL){returnNULL;}mpz_export(digits,NULL,int_digits_order,int_digit_size,int_endianness,int_nails,obj->z);returnPyLongWriter_Finish(writer);}
Reference code:GMPy_PyLong_From_MPZ() in the gmpy2 master for commit 9177648.
Benchmark:
importpyperffromgmpy2importmpzrunner=pyperf.Runner()runner.bench_func('1<<7',int,mpz(1<<7))runner.bench_func('1<<38',int,mpz(1<<38))runner.bench_func('1<<300',int,mpz(1<<300))runner.bench_func('1<<3000',int,mpz(1<<3000))
Results on Linux Fedora 40 with CPU isolation, Python built in releasemode:
| Benchmark | ref | pep757 |
|---|---|---|
| 1<<7 | 56.7 ns | 56.2 ns: 1.01x faster |
| 1<<300 | 191 ns | 213 ns: 1.12x slower |
| Geometric mean | (ref) | 1.03x slower |
Benchmark hidden because not significant (2): 1<<38, 1<<3000.
There is no impact on the backward compatibility, only new APIs areadded.
It would be convenient to support arbitrary layout to import-exportPython integers.
For example, it was proposed to add alayout parameter toPyLongWriter_Create() and alayout member to thePyLongExport structure.
The problem is that it’s more complex to implement and not reallyneeded. What’s strictly needed is only an API to import-export using thePython “native” layout.
If later there are use cases for arbitrary layouts, new APIs can beadded.
PyLong_GetNativeLayout() functionCurrently, most required information forint import/export is alreadyavailable viaPyLong_GetInfo() (andsys.int_info). We alsocan add more (like order of digits), this interface doesn’t poses anyconstraints on future evolution of thePyLongObject.
The problem is that thePyLong_GetInfo() returns a Python object,named tuple, not a convenient C structure and that might distractpeople from using it in favor e.g. of current semi-private macros likePyLong_SHIFT andPyLong_BASE.
The other approach to import/export data fromint objects might befollowing: expect, that C extensions provide contiguous buffers that CPythonthen exports (or imports) theabsolute value of an integer.
API example:
structPyLongLayout{uint8_tbits_per_digit;uint8_tdigit_size;int8_tdigits_order;};size_tPyLong_GetDigitsNeeded(PyLongObject*obj,PyLongLayoutlayout);intPyLong_Export(PyLongObject*obj,PyLongLayoutlayout,void*buffer);PyLongObject*PyLong_Import(PyLongLayoutlayout,void*buffer);
This might work for the GMP, as it hasmpz_limbs_read() andmpz_limbs_write() functions, that can provide required access tointernals ofmpz_t. Other libraries may require using temporarybuffers and then mpz_import/export-like functions on their side.
The major drawback of this approach is that it’s much more complex on theCPython side (i.e. actual conversion between different layouts). For example,implementation of thePyLong_FromNativeBytes() and thePyLong_AsNativeBytes() (together provided restricted version of therequired API) in the CPython took ~500 LOC (c.f. ~100 LOC in the currentimplementation).
value field from the export APIWith this suggestion, only one export type will exist (array of “digits”). Ifsuch view is not available for a given integer, it will be either emulated byexport functions or thePyLong_Export() will return an error. In bothcases, it’s assumed that users will use other C-API functions to get “smallenough” integers (i.e., that fits to some machine integer types), like thePyLong_AsLongAndOverflow(). ThePyLong_Export() will beinefficient (or just fail) in this case.
An example:
staticintmpz_set_PyLong(mpz_tz,PyObject*obj){intoverflow;#if SIZEOF_LONG == 8longvalue=PyLong_AsLongAndOverflow(obj,&overflow);#else/* Windows has 32-bit long, so use 64-bit long long instead */longlongvalue=PyLong_AsLongLongAndOverflow(obj,&overflow);#endifPy_BUILD_ASSERT(sizeof(value)==sizeof(int64_t));if(!overflow){if(LONG_MIN<=value&&value<=LONG_MAX){mpz_set_si(z,(long)value);}else{mpz_import(z,1,-1,sizeof(int64_t),0,0,&value);if(value<0){mpz_ttmp;mpz_init(tmp);mpz_ui_pow_ui(tmp,2,64);mpz_sub(z,z,tmp);mpz_clear(tmp);}}}else{staticPyLongExportlong_export;if(PyLong_Export(obj,&long_export)<0){return-1;}mpz_import(z,long_export.ndigits,int_digits_order,int_digit_size,int_endianness,int_nails,long_export.digits);if(long_export.negative){mpz_neg(z,z);}PyLong_FreeExport(&long_export);}return0;}
This might look as a simplification from the API designer point of view, butwill be less convenient for end users. They will have to follow Pythondevelopment, benchmark different variants for exporting small integers (is thatobvious why above case was chosen instead ofPyLong_AsInt64()?), maybesupport different code paths for various CPython versions or across differentPython implementations.
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0757.rst
Last modified:2024-12-16 07:23:59 GMT