Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous output#105202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
iritkatriel wants to merge14 commits intopython:mainfromiritkatriel:iter_nextitem

Conversation

@iritkatriel
Copy link
Member

@iritkatrieliritkatriel commentedJun 1, 2023
edited
Loading

PyIter_Next returns NULL for both normal exit and error. The caller needs to callPyErr_Occurred() to disambiguate. This PR addsPyIter_NextItem, which is not ambiguous about errors.

while ((item =PyIter_Next(iterator))) {
PyObject *item;
int res;
while ((res = PyIter_NextItem(iterator, &item)) == 0 && item != NULL) {
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it might be better ifPyIter_NextItem returned -1 for error, 0 for exhausted iterator and 1 if there is a 'next' value. Then we could just continue as long as it return 1 (without checking for NULL).

markshannon reacted with thumbs up emoji
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

My personal preference would be to stick with the C idiom of returning -1 or 0. I also think that the API design should not be discussed in a conversation on a PR.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After trying to migrate a few places, I agree. Working with 3 return values is not simpler.

erlend-aasland reacted with thumbs up emoji
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But I do think it should bePyObject* PyIter_NextItem(PyObject *o, int *err), otherwise we need two checks in every loop of iteration and that's just wasteful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

You still need two checks, no?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not on every iteration. Only when item it NULL and you exit the loop you need to see what err is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

So the return value is still ambiguous. I thought the aim of the new API was an unambiguous return value.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The aim is that you can find out whether an error occurred without calling PyErr_Occurred().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Quoting from the issue:

We will try to move away from those APIs to alternative ones whose return values non-ambiguously indicate whether there has been an error.

@iritkatrieliritkatriel requested a review froma team as acode ownerJune 1, 2023 20:37
@rhettinger
Copy link
Contributor

Personally, I don't find the new API to be an improvement. While the overall caller logic would have the same structure, it takes an additional line to create the variable, adds pointer logic in place of the cleaner looking function call, and it creates an opportunity to lose track of where the error came from. I really don't think we need this.

Also, it is somewhat aggressive to label the long-standing API as "legacy" as if it is wrong or hard to read in some way. Likewise it is aggressive to label the new API as "preferred". That will only induce some new core dev to sweep through and replace every existing call even though the current code works just fine and the existing API will never go away (at least not without creating a ton of unnecessary work for the Python ecosystem).

@iritkatriel
Copy link
MemberAuthor

it creates an opportunity to lose track of where the error came from.

The opposite: currently you need to check PyErr_Occurred() so you can’t be sure where the error came from. With this change you know.

erlend-aasland reacted with thumbs up emoji

@erlend-aasland
Copy link
Contributor

I'm still -1, as explained in the issue. Also, the PR title is misleading; you're replacing one API with an ambiguous return value with another API with an ambiguous return value. It ends up adding yet another problematic API to the list of ambiguous APIs.

@iritkatriel
Copy link
MemberAuthor

iritkatriel commentedJun 3, 2023
edited
Loading

Where is the ambiguity? The return value is the pair (PyObject*, int). Everything you need it provided by the function, so you don’t need to access global state.

@iritkatrieliritkatriel changed the titlegh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous return valuegh-105201: Add PyIter_NextItem to replace PyIter_Next which has an ambiguous outputJun 3, 2023
@iritkatriel
Copy link
MemberAuthor

I'm still -1, as explained in the issue. Also, the PR title is misleading; you're replacing one API with an ambiguous return value with another API with an ambiguous return value. It ends up adding yet another problematic API to the list of ambiguous APIs.

I reworded the title to ‘ambiguous output’ rather than ‘ambiguous return value’.

erlend-aasland reacted with thumbs up emoji

@rhettinger
Copy link
Contributor

The existing API has been used successfully for two decades and during that time I've not seen a single user request for an alternative API. There isn't a real user issue being solved here. ISTM that this is an invented problem and that the solution is in some ways worse than what we have now. In addition, having more than one way to do it will create more problems than it solves. Since this is part of the stable ABI, the new and old functions will live in perpetuity and create be a recurring source of confusion. Also, the new API will not be usable by any package wanting to be compatible will versions of Python before 3.12. Having a mix of techniques is an undesirable outcome.

zooba reacted with thumbs up emoji

@iritkatriel
Copy link
MemberAuthor

The issue of c api functions with ambiguous return values came up in the discussions that started at the recent language summit, about the problems with the c api (see the blog post about that). The problem is not only related to PyIter_Next usage in cpython, but to how the c api works for alternative python implementations (that don’t use cpython’s internal error handling mechanism.)

There was some discussion about whether we can fix problems like this incrementally, or we just need to give up on the current api and redesign a new one from scratch. I believe it’s always better to fix things incrementally when you can, and redesign from scratch only when you have to (the redesign can then focus on the problems that actually require it). Other core devs said it is not possible to fix the current c api, and I see now (almost at the first hurdle) what they mean. This probably means that we will have a very inflated list of problems “requiring” a new c api. Not because that is the right way to go about this, but because we won’t be able to get through discussions like this to make incremental fixes.

It probably won’t make much of a difference to the new c api (there will likely be a new one either way because not all problems can be fixed incrementally in the current one). But I do think it’s sad if the current c api (which will be with us for a long time) can’t evolve because most core devs have given up on that. And I certainly understand now why they have.

@iritkatriel
Copy link
MemberAuthor

Reopening following discussion on the issue.

@markshannon
Copy link
Member

I strongly preferint PyIter_NextItem(PyObject *iter, PyObject **next) for reasons explained on the issue.

erlend-aasland and zooba reacted with thumbs up emojierlend-aasland reacted with heart emoji

@markshannon
Copy link
Member

markshannon commentedJun 6, 2023
edited
Loading

@rhettinger@erlend-aasland
The reason that this is a problem, is that it makes the error handling in the VM and C API stateful. And that state causes alot of problems.

The existence of this state means we need to check that the VM is in a valid state at C API boundaries.
This is slow when we do it correctly, e.g. when calling builtin functions, and dangerous when we don't , e.g when calling slot wrappers.

Currently there are four possible states after calling an API function that may fail (which is all but a few):

  • Return a valid value, and don't set an exception
  • Return an invalid value, and set an exception
  • Return an valid value, and set an exception
  • Return an invalid value, and don't set an exception

The first two of those are correct, but even then the second case requires the caller to handle the exception, or it leaves the VM in an invalid state.
The third isalways wrong; it is dangerous if not checked for, so we need to always check for it.
The fourth is wrong for some functions, but for functions likePyIter_Next needs an additional check, which is another problem.

We have to constantly check that the exception is not set, because if it were and we callPyIter_Next on an empty iterator, and it returnsNULL without setting an exception, we would think that it failed, not that it terminated.

If the return values all C API functions (and C extension code) were unambiguous, then error handling would become (mostly) stateless, making the VM more robust and calling into C extensions potentially faster, as none of the resulting states can leave the VM in a invalid state.

  • Return a valid value, and don't set an exception. No problem
  • Return an invalid value, and set an exception
  • Return an valid value, and set an exception
  • Return an invalid value, and don't set an exception

The goal is thatPyErr_Occurred() is only called to fetch the error after a function returnsNULL or-1. That way it never needs to cleared or monitored, saving a lot of code that would no longer need to worry about whether there was a "current" exception, as there would be no such thing. The "current" exception should only be meaningful immediately after a failed C API call.

Here's an example. Suppose we are iterating over two iterators at the same time (like inzip):

    PyObject *a = PyIter_Next(iter1);    PyObject *b = PyIter_Next(iter2);

But we forget to check the error case ofiter1.

    PyObject *a = PyIter_Next(iter1);    if (a == NULL) {        a_exhausted = true;    }

but we do checkiter2

    PyObject *b = PyIter_Next(iter2);    if (b == NULL && PyErr_hasOcurred()) {         /* Bug here          * We think iter2 has failed, but iter2 may have          * terminated and iter1 raised an exception */    }
erlend-aasland and zooba reacted with thumbs up emoji

@markshannon
Copy link
Member

There is an efficiency issue here, as well.
Changing to usingPyIter_NextItem adds quite a lot of overhead, because of the impedance mismatch with the underlyingtp_iternext function pointer. We still need to do the additional check ofPyErr_Occurred() astp_iternext returnsNULL for either exhaustion or error.

If we change the underlying protocol to match that ofPyIter_NextItem things become a lot more efficient.

intPyIter_NextItem(PyObject*iter,PyObject**next){return (*Py_TYPE(iter)->tp_iternextitem)(iter,next);}

Note that iftp_iternextitem raisesStopIteration, then so doesPyIter_NextItem, so implementations oftp_iternextitem will need to be aware of this. This is a non-issue for almost any iterators other than generators, or wrappers around__next__ methods.

erlend-aasland and encukou reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@JelleZijlstraJelleZijlstraJelleZijlstra left review comments

@erlend-aaslanderlend-aaslanderlend-aasland left review comments

@vstinnervstinnerAwaiting requested review from vstinner

@encukouencukouAwaiting requested review from encukouencukou is a code owner

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon will be requested when the pull request is marked ready for reviewmarkshannon is a code owner

@ammaraskarammaraskarAwaiting requested review from ammaraskarammaraskar will be requested when the pull request is marked ready for reviewammaraskar is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@iritkatriel@rhettinger@erlend-aasland@markshannon@JelleZijlstra@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp