Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-107137: Add _PyTupleBuilder API to the internal C API#107139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
vstinner wants to merge1 commit intopython:mainfromvstinner:tuple_builder

Conversation

vstinner
Copy link
Member

@vstinnervstinner commentedJul 23, 2023
edited by bedevere-bot
Loading

Add _PyTupleBuilder structure and functions:

  • _PyTupleBuilder_Init()
  • _PyTupleBuilder_Alloc()
  • _PyTupleBuilder_Append()
  • _PyTupleBuilder_AppendUnsafe()
  • _PyTupleBuilder_Finish()
  • _PyTupleBuilder_Dealloc()

The builder tracks the size of the tuple and resize it in _PyTupleBuilder_Finish() if needed. Don't allocate empty tuple. Allocate an array of 16 objects on the stack to avoid allocating small tuple. _PyTupleBuilder_Append() overallocates the tuple by 25% to reduce the number of _PyTuple_Resize() calls.

Do no track the temporary internal tuple by the GC before _PyTupleBuilder_Finish() creates the final complete and consistent tuple object.

Use _PyTupleBuilder API in itertools batched_traverse(), PySequence_Tuple() and initialize_structseq_dict().

Add also helper functions:

  • _PyTuple_ResizeNoTrack()
  • _PyTuple_NewNoTrack()

@vstinner
Copy link
MemberAuthor

I chose to use unsignedsize_t in the structure in _PyTupleBuilder_Alloc()size argument. I'm not sure if it's better or worse than classic signedPy_ssize_t type.

cc@serhiy-storchaka@methane@pablogsal@erlend-aasland@corona10

@vstinner
Copy link
MemberAuthor

HPy API to build a tuples and lists:https://docs.hpyproject.org/en/latest/api-reference/builder.html HPy API seems to only support creating tuple of a size known in advance: it's not possible to extend or shrink the tuple.

@vstinner
Copy link
MemberAuthor

I'm considering to add a "Set" function later, and/or maybe a "GetUnsafe" function. But I prefer to start with a minimum API :-)

@gvanrossum
Copy link
Member

-1. This feels unnecessarily elaborate.

@vstinner
Copy link
MemberAuthor

This API is a fix for this old issue:#59313 "Incomplete tuple created by PyTuple_New() and accessed via the GC can trigged a crash" which was closed as "not a bug" in 2021 by@pablogsal.

@vstinner
Copy link
MemberAuthor

@gvanrossum:

-1. This feels unnecessarily elaborate.

Do you mean thatcapi-workgroup/problems#56 is not an issue, or that this API is too complicated to use?

@gvanrossum
Copy link
Member

I believe we should review the problem before jumping to a solution. And if this is the solution, I'm not sure that it's worth fixing the problem. So please hold your horses here.

@vstinner
Copy link
MemberAuthor

The root issue is that PyTuple_New() tracks directly the tuple by the GC. Moreover, _PyTuple_Resize() tracks also the tuple by the GC. My PR adds _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack() to avoid this issue.

The _PyTupleBuilder is built on top of it to wrap the memory allocations.

Add _PyTupleBuilder structure and functions:* _PyTupleBuilder_Init()* _PyTupleBuilder_Alloc()* _PyTupleBuilder_Append()* _PyTupleBuilder_AppendUnsafe()* _PyTupleBuilder_Finish()* _PyTupleBuilder_Dealloc()The builder tracks the size of the tuple and resize it in_PyTupleBuilder_Finish() if needed. Don't allocate empty tuple.Allocate an array of 16 objects on the stack to avoid allocatingsmall tuple. _PyTupleBuilder_Append() overallocates the tuple by 25%to reduce the number of _PyTuple_Resize() calls.Do no track the temporary internal tuple by the GC before_PyTupleBuilder_Finish() creates the final complete and consistenttuple object.Use _PyTupleBuilder API in itertools batched_traverse(),PySequence_Tuple() and initialize_structseq_dict().Add also helper functions:* _PyTuple_ResizeNoTrack()* _PyTuple_NewNoTrack()
@vstinner
Copy link
MemberAuthor

PR rebased to fix a merge conflict.

@gvanrossum
Copy link
Member

All I am asking is that you hold off for now.

@vstinnervstinner marked this pull request as draftJuly 24, 2023 00:58
@vstinner
Copy link
MemberAuthor

I marked this PR as a draft.

gvanrossum reacted with thumbs up emoji

@corona10
Copy link
Member

corona10 commentedJul 24, 2023
edited
Loading

Guido, even if there is a better way to solve this issue, adding the internal private API doesn't look harmful.
If we decide to adopt a better solution in the future, we can replace the API with a better one at any time.

@gvanrossum
Copy link
Member

@corona10 I worry that APIs are forever, even internal ones. So I recommend that we have a discussion somewhere so we're all on the same page about the problem we're trying to solve andthen we can how to solve it, rather than jumping the gun. (I don't require unanimity, just more people having thought about it and come to roughly the same conclusion than just Victor.)

@corona10
Copy link
Member

So I recommend that we have a discussion somewhere so we're all on the same page about the problem we're trying to solve and then we can how to solve it, rather than jumping the gun

Thank you for the answer. CPython seems to be on the brink of change in many topics.
All topics are very controversial right now, and the time looks like some core team members have to stop everything for a while and watch. But I hope that the time is not that long :)

@vstinner
Copy link
MemberAuthor

In the main Python branch, you can still crash PySequence_Tuple() because it creates a temporary tuple with PyTuple_New() which is immediately tracked by the GC:

importgcTAG=object()defmonitor():lst= [xforxingc.get_referrers(TAG)ifisinstance(x,tuple)]t=lst[0]# this *is* the result tupleprint(t)# full of nulls !returnt# Keep it alive for some timedefmy_iter():yieldTAG# 'tag' gets stored in the result tuplet=monitor()forxinrange(10):yieldx# SystemError when the tuple needs to be resizedtuple(my_iter())

code from:#59313 (comment)

Program in gdb:

vstinner@mona$ gdb -args ./python x.py (...)(<object object at 0x7fffea574af0>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>, <NULL>)Breakpoint 1, _PyTuple_Resize (pv=0x7fffffffb8b0, newsize=25)    at Objects/tupleobject.c:910910        PyErr_BadInternalCall();(gdb) up#1  0x00000000004cd05e in PySequence_Tuple (    v=<generator at remote 0x7fffea585a90>) at Objects/abstract.c:21352135            if (_PyTuple_Resize(&result, n) != 0) {

The crash occurs in_PyTuple_Resize(): that's why I added no only_PyTuple_NewNoTrack(), but also_PyTuple_ResizeNoTrack().

The problem is that there is a second strong reference to the tuple (created bymonitor()):(Py_SIZE(v) != 0 && Py_REFCNT(v) != 1) test failed in_PyTuple_Resize().

@vstinner
Copy link
MemberAuthor

In the main Python branch, you can still crash PySequence_Tuple() because it creates a temporary tuple with PyTuple_New() which is immediately tracked by the GC

This PR fix this bug.

With this PR,monitor() cannot fail the tuple anymore, since it's no longer tracked by the GC while the tuple is being filled, and so it's no longer possible to crash Python inPySequence_Tuple().

monitor() fails at:

Traceback (most recent call last):  File "/home/vstinner/python/main/x.py", line 17, in <module>    tuple(my_iter())  File "/home/vstinner/python/main/x.py", line 13, in my_iter    t = monitor()        ^^^^^^^^^  File "/home/vstinner/python/main/x.py", line 7, in monitor    t = lst[0]   # this *is* the result tuple        ~~~^^^IndexError: list index out of range

@gvanrossum
Copy link
Member

@vstinner Please stop pushing. We've lived with this forever. It doesn't have to be fixed today.

@vstinner
Copy link
MemberAuthor

I extracted the non-controversial part of this PR, only _PyTuple_NewNoTrack() and _PyTuple_ResizeNoTrack(), in a new PR fixing the PySequence_Tuple() crash: PR#107183.

@rhettinger
Copy link
Contributor

Can tuples be made robust enough to survive being called while partially filled? Thetuple_dealloc code already usesPy_XDECREF to support NULL elements. Perhaps the other tuple methods could be similarly fortified.

If not, tools likeitertools.batched still have another reasonable defense against the likes ofgc.get_referrers(). WhenPyTuple_New(n) is called, it can be immediately filled withPy_None objects. Then as new data arrives, it can be swapped in. That way the tuple is always in a consistent state even if accessed by GC before completion.

@gvanrossum
Copy link
Member

Raymond's idea of filling withNone is more viable than it used to be sinceNone is now immortal (as of 3.12) so we won't need to worry about its refcount. Still, I worry that there might be C code that creates a new tuple and somehowrelies on the items beingNULL.

erlend-aasland reacted with thumbs up emoji

@vstinner
Copy link
MemberAuthor

It seems like issues solved by the proposed _PyTupleBuilder API have different solutions discussed at PR#107183. This API doesn't seem to be the preferred option, so I prefer to close my PR and investigate other options first.

@vstinner
Copy link
MemberAuthor

Follow-up: I created issue#111489 to make_PyTuple_FromArraySteal() and_PyList_FromArraySteal() functions public.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@rhettingerrhettingerrhettinger left review comments

Assignees
No one assigned
Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@vstinner@gvanrossum@corona10@rhettinger@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp