Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-102471, PEP 757: Add PyLong import and export API#121339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
vstinner commentedJul 3, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
vstinner commentedJul 3, 2024
See also issue#111415 |
skirpichev left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Just my 2c.
The gmpy2 code, used for benchmarking, can be found in my fork:
https://github.com/skirpichev/gmpy/tree/trying-py-import-export
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Objects/longobject.c Outdated
| PyUnstable_Long_Export(PyObject*obj,PyUnstable_LongExport*long_export) | ||
| { | ||
| if (!PyLong_Check(obj)) { | ||
| PyErr_Format(PyExc_TypeError,"expect int, got %T",obj); | ||
| return-1; | ||
| } | ||
| PyLongObject*self= (PyLongObject*)obj; | ||
| long_export->obj= (PyLongObject*)Py_NewRef(obj); | ||
| long_export->negative=_PyLong_IsNegative(self); | ||
| long_export->ndigits=_PyLong_DigitCount(self); | ||
| if (long_export->ndigits==0) { | ||
| long_export->ndigits=1; | ||
| } | ||
| long_export->digits=self->long_value.ob_digit; | ||
| return0; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
As this mostly give a direct access to the PyLongObject - it almost as fast as using private stuff before.
Old code:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'1000000 loops, best of 20: 232 nsec per loop$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'500000 loops, best of 11: 500 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'100000 loops, best of 20: 2.53 usec per loopWith proposed API:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**2' 'mpz(x)'1000000 loops, best of 20: 258 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**100' 'mpz(x)'500000 loops, best of 20: 528 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=10**1000' 'mpz(x)'100000 loops, best of 20: 2.56 usec per loopObjects/longobject.c Outdated
| PyObject* | ||
| PyUnstable_Long_Import(intnegative,size_tndigits,Py_digit*digits) | ||
| { | ||
| return (PyObject*)_PyLong_FromDigits(negative,ndigits,digits); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
But this is something I would like to avoid. This requires allocation of a temporary buffer and using memcpy. Can we offer awritable layout to use it's digits in the mpz_export directly?
Benchmarks for old code:
$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'2000000 loops, best of 11: 111 nsec per loop$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'500000 loops, best of 11: 475 nsec per loop$ python -m timeit -r11 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'100000 loops, best of 11: 2.39 usec per loopWith new API:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'2000000 loops, best of 20: 111 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'500000 loops, best of 20: 578 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'100000 loops, best of 20: 2.53 usec per loopThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This requires allocation of a temporary buffer and using memcpy.
Right, PyLongObject has to manage its own memory.
Can we offer a writable layout to use it's digits in the mpz_export directly?
That sounds strange from the Python point of view and make the internals "less opaque". I would prefer to leak less implementation details.
skirpichevJul 4, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Right, PyLongObject has to manage its own memory.
I'm not trying to change that. More complete proposal:vstinner#4
gmpy2 patch:https://github.com/skirpichev/gmpy/tree/trying-py-import-export-v2
New benchmarks:
$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**2)' 'int(x)'2000000 loops, best of 20: 111 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**100)' 'int(x)'500000 loops, best of 20: 509 nsec per loop$ python -m timeit -r20 -s 'from gmpy2 import mpz;x=mpz(10**1000)' 'int(x)'100000 loops, best of 20: 2.44 usec per loopI would prefer to leak less implementation details.
I don't think this leak anything. It doesn't leak memory management details.PyLong_Import will just do allocation memory. Writting digits will be job for mpz_export, as before.
Without this, it seems - there are noticeable performance regression for integers of intermediate range. Something up to 20% vs 7% on my branch.
Edit: currently, proposedPyUnstable_Long_ReleaseImport() matchPyUnstable_Long_ReleaseExport(). Perhaps, it could be one function (say, PyUnstable_Long_ReleaseDigitArray()), but I unsure - maybe it puts some constraints on internals of the PyLongObject.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
skirpichev commentedJul 4, 2024
CC@tornaria, as Sage people might be interested in this feature. |
skirpichev commentedJul 4, 2024
CC@oscarbenjamin, you may want this for python-flint |
vstinner commentedJul 4, 2024
I updated my PR:
|
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Misc/NEWS.d/next/C API/2024-07-03-17-26-53.gh-issue-102471.XpmKYk.rst OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
oscarbenjamin commentedJul 4, 2024
Absolutely. Currently python-flint uses a hex-string as an intermediate format when converting between large int and |
Uh oh!
There was an error while loading.Please reload this page.
vstinner commentedJul 4, 2024
@skirpichev: I added a PyLongWriter API similar to what@encukou proposed. Example: PyLongObject*_PyLong_FromDigits(intnegative,Py_ssize_tdigit_count,digit*digits){PyLongWriter*writer=PyLongWriter_Create();if (writer==NULL) {returnNULL; }if (negative) {PyLongWriter_SetSign(writer,-1); }Py_digit*writer_digits=PyLongWriter_AllocDigits(writer,digit_count);if (writer_digits==NULL) { gotoerror; }memcpy(writer_digits,digits,digit_count*sizeof(digit));return (PyLongObject*)PyLongWriter_Finish(writer);error:PyLongWriter_Discard(writer);returnNULL;} The >>>import_testcapi;_testcapi.pylong_import(0, [100,0,0])is100True |
vstinner commentedJul 4, 2024
I mark the PR as a draft until we agree on the API. |
skirpichev commentedJul 4, 2024
Yes, that looks better and should fix speed regression. I'll try to benchmark that, perhaps tomorrow. But cost is 5 (!) public functions and one new struct, additionally to |
vstinner commentedJul 4, 2024
I updated the PR to remove the |
vstinner commentedJul 4, 2024
My concern is to avoid the problemcapi-workgroup/api-evolution#36 : avoid exposing |
vstinner commentedJul 4, 2024
@skirpichev: Would it be useful to add a |
picnixz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
For users, I think it'd be great if they know (in the docs) whether something can be NULL or not, especially since we have a similar interface for unicode objects.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| digit*digits=PyMem_Malloc((size_t)ndigits*sizeof(digit)); | ||
| if (digits==NULL) { | ||
| returnPyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The docs say that we should check for NULL but agreed that we could update them.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
| digit*digits=PyMem_Malloc((size_t)ndigits*sizeof(digit)); | ||
| if (digits==NULL) { | ||
| returnPyErr_NoMemory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Maybe bug is somewhere else? Newly added functions usually are explicitly say if they set an exception on error.
Uh oh!
There was an error while loading.Please reload this page.
| intoverflow; | ||
| #ifSIZEOF_LONG==8 | ||
| longvalue=PyLong_AsLongAndOverflow(obj,&overflow); | ||
| #else | ||
| // Windows has 32-bit long, so use 64-bit long long instead | ||
| long longvalue=PyLong_AsLongLongAndOverflow(obj,&overflow); | ||
| #endif | ||
| Py_BUILD_ASSERT(sizeof(value)==sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Another reason for using this API was "no-error" contract. Maybe we can specify that in docs as a CPython implementation detail? IIUC, we are free to change such things in new releases without deprecation period.
Note also, that gmpy2 benchmarks measure not just CPython side, but the whole conversion path (int->gmpy2.mpz in this case).
| If *export_long->digits* is not ``NULL``,:c:func:`PyLong_FreeExport` must | ||
| be called when the export is no longer needed. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
| ..impl-detail:: | |
| This function always succeeds if *obj* is a Python:class:`int` object | |
| or a subclass. | |
Lets see if we can restore this in a that way. It might be helpful for e.g. Sage, which doesn't support PyPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I would prefer to not add this note. It was controversial during PEP 757 design.
@serhiy-storchaka@encukou: What do you think? Would you be ok to declare that the PyLong_Export() function cannot fail if the argument is a Python int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It was controversial during PEP 757 design.
It was proposed unconditionally, not as CPython's implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm fine with it, as an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
vstinner commentedDec 10, 2024
@picnixz: I addressed your review. Please review the updated PR. |
picnixz left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Final nits and LGTM. Since I'm on mobile I don't know whether the spaces are correct or not so please check it manually.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
vstinner commentedDec 11, 2024
Thanks for all reviews. I think that the change is now ready to be merged, I addressed all comments. I plan to merge the PR Friday. |
skirpichev commentedDec 11, 2024
I would appreciate this, but... Isn't now an approval from other core developer is required for this type of pr's? |
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Update alsoDoc/data/refcounts.dat.
| intoverflow; | ||
| #ifSIZEOF_LONG==8 | ||
| longvalue=PyLong_AsLongAndOverflow(obj,&overflow); | ||
| #else | ||
| // Windows has 32-bit long, so use 64-bit long long instead | ||
| long longvalue=PyLong_AsLongLongAndOverflow(obj,&overflow); | ||
| #endif | ||
| Py_BUILD_ASSERT(sizeof(value)==sizeof(int64_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This code is so close to the PyLong implementation, that I think we should use_PyLong_IsCompact() +_PyLong_CompactValue() to be sure that it matches the specification.
Uh oh!
There was an error while loading.Please reload this page.
| if (export_long->ndigits==0) { | ||
| export_long->ndigits=1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It's to make the API easier to use: the consumer doesn't have to both with ndigits==0 special case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In fact it's assumed (see e.g._PyLong_New), that the digit array always has at least one digit (and it's initialized for 0 too). And also, if you pass ndigits==0 to the mpz_import() ascount parameter, then it just calls malloc(0).
But I think it's safe just drop this check. This condition is unreachible here, as0 handled in the!overflow case.
| if (export_long->ndigits==0) { | |
| export_long->ndigits=1; | |
| } |
| PyLongWriter* | ||
| PyLongWriter_Create(intnegative,Py_ssize_tndigits,void**digits) | ||
| { | ||
| if (ndigits <=0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why not allow 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@gpshead asked to reject this case:https://discuss.python.org/t/pep-757-c-api-to-import-export-python-integers/63895/74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Actually, he asked to document that case (i.e. when ndigits==0).
We don't think it's a good idea to overbloat docs with this edge case (different functions should be used for small integers). PEP has a dedicated section to discuss import for small integers, suggesting different functions. (In fact, the whole API is about import/export for big integers.)
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
encukou left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good, though I have a few suggestions.
Also, please test thatPyLong_FreeExportcan be used afterPyLong_Export fills in the compactvalue. AFAICS, the tests now skip it if they can.
Uh oh!
There was an error while loading.Please reload this page.
| If *export_long->digits* is not ``NULL``,:c:func:`PyLong_FreeExport` must | ||
| be called when the export is no longer needed. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm fine with it, as an implementation detail.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
vstinner commentedDec 12, 2024
@serhiy-storchaka and@encukou: I addressed your reviews. Please review the updated PR.
I added
Done. |
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM.
Please do not forget to edit the commit message before merging.
6446408 intopython:mainUh oh!
There was an error while loading.Please reload this page.
vstinner commentedDec 13, 2024
Ok, I merged the PR. Big thanks to everyone who reviewed this PR which got 272 comments and 54 commits! Special thanks to@skirpichev who wrote a big part of this work. |
skirpichev commentedFeb 26, 2025
IIUIC, Now anonymous unionsare allowed. Maybe we could use them in the export API in this way: typedefstructPyLongExport2 {union {int64_tcompact_value;struct {uint8_tnegative;Py_ssize_tndigits;constvoid*digits;Py_uintptr_t_reserved; }digit_array; };}PyLongExport2; ? This structure has smaller size. Also, such API make a more clear distinction for alternative views. |
vstinner commentedFeb 26, 2025
How do you know if compact_value or digit_array must be used? |
skirpichev commentedFeb 26, 2025
By PyLong_Export's return value. Errors < 0. Nonnegative values - for possible export types. |
vstinner commentedFeb 26, 2025
PEP 757 had to go through the C API Working Group and then the Steering Council. I don't think that this change is worth it to restart this validation process. |
Uh oh!
There was an error while loading.Please reload this page.
Add PyLong_Export() and PyLong_Import() functions and PyLong_LAYOUT structure.
📚 Documentation preview 📚:https://cpython-previews--121339.org.readthedocs.build/