We propose to extend the iterator protocol with a new__(a)iterclose__ slot, which is called automatically on exit from(async)for loops, regardless of how they exit. This allows forconvenient, deterministic cleanup of resources held by iteratorswithout reliance on the garbage collector. This is especially valuablefor asynchronous generators.
In practical terms, the proposal here is divided into two separateparts: the handling of async iterators, which should ideally beimplemented ASAP, and the handling of regular iterators, which is alarger but more relaxed project that can’t start until 3.7 at theearliest. But since the changes are closely related, and we probablydon’t want to end up with async iterators and regular iteratorsdiverging in the long run, it seems useful to look at them together.
Python iterables often hold resources which require cleanup. Forexample:file objects need to be closed; theWSGI spec adds aclose methodon top of the regular iterator protocol and demands that consumerscall it at the appropriate time (though forgetting to do so is afrequent source of bugs);andPEP 342 (based onPEP 325) extended generator objects to add aclose method to allow generators to clean up after themselves.
Generally, objects that need to clean up after themselves also definea__del__ method to ensure that this cleanup will happeneventually, when the object is garbage collected. However, relying onthe garbage collector for cleanup like this causes serious problems inseveral cases:
__del__ may be arbitrarily delayed– yet many situations requireprompt cleanup ofresources. Delayed cleanup produces problems like crashes due tofile descriptor exhaustion, or WSGI timing middleware that collectsbogus times.__del__ doesn’thave access to the coroutine runner; indeed, the coroutine runnermight be garbage collected before the generator object. So relyingon the garbage collector is effectively impossible without some kindof language extension. (PEP 525 does provide such an extension, butit has a number of limitations that this proposal fixes; see the“alternatives” section below for discussion.)Fortunately, Python provides a standard tool for doing resourcecleanup in a more structured way:with blocks. For example, thiscode opens a file but relies on the garbage collector to close it:
defread_newline_separated_json(path):forlineinopen(path):yieldjson.loads(line)fordocumentinread_newline_separated_json(path):...
and recent versions of CPython will point this out by issuing aResourceWarning, nudging us to fix it by adding awith block:
defread_newline_separated_json(path):withopen(path)asfile_handle:# <-- with blockforlineinfile_handle:yieldjson.loads(line)fordocumentinread_newline_separated_json(path):# <-- outer for loop...
But there’s a subtlety here, caused by the interaction ofwithblocks and generators.with blocks are Python’s main tool formanaging cleanup, and they’re a powerful one, because they pin thelifetime of a resource to the lifetime of a stack frame. But thisassumes that someone will take care of cleaning up the stackframe… and for generators, this requires that someoneclosethem.
In this case, adding thewith blockis enough to shut up theResourceWarning, but this is misleading – the file object cleanuphere is still dependent on the garbage collector. Thewith blockwill only be unwound when theread_newline_separated_jsongenerator is closed. If the outerfor loop runs to completion thenthe cleanup will happen immediately; but if this loop is terminatedearly by abreak or an exception, then thewith block won’tfire until the generator object is garbage collected.
The correct solution requires that allusers of this API wrap everyfor loop in its ownwith block:
withclosing(read_newline_separated_json(path))asgenobj:fordocumentingenobj:...
This gets even worse if we consider the idiom of decomposing a complexpipeline into multiple nested generators:
defread_users(path):withclosing(read_newline_separated_json(path))asgen:fordocumentingen:yieldUser.from_json(document)defusers_in_group(path,group):withclosing(read_users(path))asgen:foruseringen:ifuser.group==group:yielduser
In general if you have N nested generators then you need N+1withblocks to clean up 1 file. And good defensive programming wouldsuggest that any time we use a generator, we should assume thepossibility that there could be at least onewith block somewherein its (potentially transitive) call stack, either now or in thefuture, and thus always wrap it in awith. But in practice,basically nobody does this, because programmers would rather writebuggy code than tiresome repetitive code. In simple cases like thisthere are some workarounds that good Python developers know (e.g. inthis simple case it would be idiomatic to pass in a file handleinstead of a path and move the resource management to the top level),but in general we cannot avoid the use ofwith/finally insideof generators, and thus dealing with this problem one way oranother. When beauty and correctness fight then beauty tends to win,so it’s important to make correct code beautiful.
Still, is this worth fixing? Until async generators came along I wouldhave argued yes, but that it was a low priority, since everyone seemsto be muddling along okay – but async generators make it much moreurgent. Async generators cannot do cleanupat all without somemechanism for deterministic cleanup that people will actually use, andasync generators are particularly likely to hold resources like filedescriptors. (After all, if they weren’t doing I/O, they’d begenerators, not async generators.) So we have to do something, and itmight as well be a comprehensive fix to the underlying problem. Andit’s much easier to fix this now when async generators are firstrolling out, than it will be to fix it later.
The proposal itself is simple in concept: add a__(a)iterclose__method to the iterator protocol, and have (async)for loops callit when the loop is exited, even if this occurs viabreak orexception unwinding. Effectively, we’re taking the current cumbersomeidiom (with block +for loop) and merging them together into afancierfor. This may seem non-orthogonal, but makes sense whenyou consider that the existence of generators means thatwithblocks actually depend on iterator cleanup to work reliably, plusexperience showing that iterator cleanup is often a desirable featurein its own right.
PEP 525 proposes a set of global thread-local hooksmanaged by newsys.{get/set}_asyncgen_hooks() functions, whichallow event loops to integrate with the garbage collector to runcleanup for async generators. In principle, this proposal andPEP 525are complementary, in the same way thatwith blocks and__del__ are complementary: this proposal takes care of ensuringdeterministic cleanup in most cases, whilePEP 525’s GC hooks clean upanything that gets missed. But__aiterclose__ provides a number ofadvantages over GC hooks alone:
classMyAsyncIterator:asyncdef__anext__():...
then you can’t refactor this into an async generator withoutchanging its semantics, and vice-versa. This seems veryunpythonic. (It also leaves open the question of what exactlyclass-based async iterators are supposed to do, given that they faceexactly the same cleanup problems as async generators.)__aiterclose__, on the other hand, is defined at the protocollevel, so it’s duck-type friendly and works for all iterators, notjust generators.
__aiterclose__, it’s more or less guaranteed that developers whodevelop and test on CPython will produce libraries that leakresources when used on PyPy. Developers who do want to targetalternative implementations will either have to take the defensiveapproach of wrapping everyfor loop in awith block, or elsecarefully audit their code to figure out which generators mightpossibly contain cleanup code and addwith blocks around thoseonly. With__aiterclose__, writing portable code becomes easyand natural.__aiterclose__, developers who care about this kind ofrobustness will either have to take the defensive approach ofwrapping everyfor loop in awith block, or else carefullyaudit their code to figure out which generators might possiblycontain cleanup code.__aiterclose__ plugs this hole byperforming cleanup in the caller’s context, so writing more robustcode becomes the path of least resistance.__aiterclose__ theabsolute most minimalistic middleware in our system looks somethinglike:asyncdefnoop_middleware(handler,request_header,request_body):asyncwithaclosing(handler(request_body,request_body))asaiter:asyncforresponse_iteminaiter:yieldresponse_item
Arguably in regular code one can get away with skipping thewithblock aroundfor loops, depending on how confident one is thatone understands the internal implementation of the generator. Buthere we have to cope with arbitrary response handlers, so without__aiterclose__, thiswith construction is a mandatory partof every middleware.
__aiterclose__ allows us to eliminate the mandatory boilerplateand an extra level of indentation from every middleware:
asyncdefnoop_middleware(handler,request_header,request_body):asyncforresponse_iteminhandler(request_header,request_body):yieldresponse_item
So the__aiterclose__ approach provides substantial advantagesover GC hooks.
This leaves open the question of whether we want a combination of GChooks +__aiterclose__, or just__aiterclose__ alone. Sincethe vast majority of generators are iterated over using afor loopor equivalent,__aiterclose__ handles most situations before theGC has a chance to get involved. The case where GC hooks provideadditional value is in code that does manual iteration, e.g.:
agen=fetch_newline_separated_json_from_url(...)whileTrue:document=awaittype(agen).__anext__(agen)ifdocument["id"]==needle:break# doesn't do 'await agen.aclose()'
If we go with the GC-hooks +__aiterclose__ approach, thisgenerator will eventually be cleaned up by GC calling the generator__del__ method, which then will use the hooks to call back intothe event loop to run the cleanup code.
If we go with the no-GC-hooks approach, this generator will eventuallybe garbage collected, with the following effects:
__del__ method will issue a warning that the generator wasnot closed (similar to the existing “coroutine never awaited”warning).__del__ methods will release the actualoperating system resources.The solution here – as the warning would indicate – is to fix thecode so that it calls__aiterclose__, e.g. by using awithblock:
asyncwithaclosing(fetch_newline_separated_json_from_url(...))asagen:whileTrue:document=awaittype(agen).__anext__(agen)ifdocument["id"]==needle:break
Basically in this approach, the rule would be that if you want tomanually implement the iterator protocol, then it’s yourresponsibility to implement all of it, and that now includes__(a)iterclose__.
GC hooks add non-trivial complexity in the form of (a) new globalinterpreter state, (b) a somewhat complicated control flow (e.g.,async generator GC always involves resurrection, so the details of PEP442 are important), and (c) a new public API in asyncio (awaitloop.shutdown_asyncgens()) that users have to remember to call atthe appropriate time. (This last point in particular somewhatundermines the argument that GC hooks provide a safe backup toguarantee cleanup, since ifshutdown_asyncgens() isn’t calledcorrectly then Ithink it’s possible for generators to be silentlydiscarded without their cleanup code being called; compare this to the__aiterclose__-only approach where in the worst case we still atleast get a warning printed. This might be fixable.) All thisconsidered, GC hooks arguably aren’t worth it, given that the onlypeople they help are those who want to manually call__anext__ yetdon’t want to manually call__aiterclose__. But Yury disagreeswith me on this :-). And both options are viable.
Several commentators on python-dev and python-ideas have suggestedthat a pattern to avoid these problems is to always pass resources infrom above, e.g.read_newline_separated_json should take a fileobject rather than a path, with cleanup handled at the top level:
defread_newline_separated_json(file_handle):forlineinfile_handle:yieldjson.loads(line)defread_users(file_handle):fordocumentinread_newline_separated_json(file_handle):yieldUser.from_json(document)withopen(path)asfile_handle:foruserinread_users(file_handle):...
This works well in simple cases; here it lets us avoid the “N+1with blocks problem”. But unfortunately, it breaks down quicklywhen things get more complex. Consider if instead of reading from afile, our generator was reading from a streaming HTTP GET request –while handling redirects and authentication via OAUTH. Then we’dreally want the sockets to be managed down inside our HTTP clientlibrary, not at the top level. Plus there are other cases wherefinally blocks embedded inside generators are important in theirown right: db transaction management, emitting logging informationduring cleanup (one of the major motivating use cases for WSGIclose), and so forth. So this is really a workaround for simplecases, not a general solution.
The semantics of__(a)iterclose__ are somewhat inspired bywith blocks, but context managers are more powerful:__(a)exit__ can distinguish between a normal exit versus exceptionunwinding, and in the case of an exception it can examine theexception details and optionally suppresspropagation.__(a)iterclose__ as proposed here does not have thesepowers, but one can imagine an alternative design where it did.
However, this seems like unwarranted complexity: experience suggeststhat it’s common for iterables to haveclose methods, and even tohave__exit__ methods that callself.close(), but I’m notaware of any common cases that make use of__exit__’s fullpower. I also can’t think of any examples where this would beuseful. And it seems unnecessarily confusing to allow iterators toaffect flow control by swallowing exceptions – if you’re in asituation where you really want that, then you should probably use arealwith block anyway.
This section describes where we want to eventually end up, thoughthere are some backwards compatibility issues that mean we can’t jumpdirectly here. A later section describes the transition plan.
Generally,__(a)iterclose__ implementations should:
__(a)iterclose__ iscalled. In particular, once__(a)iterclose__ has been calledthen calling__(a)next__ produces undefined behavior.And generally, any code which starts iterating through an iterablewith the intention of exhausting it, should arrange to make sure that__(a)iterclose__ is eventually called, whether or not the iteratoris actually exhausted.
The core proposal is the change in behavior offor loops. Giventhis Python code:
forVARinITERABLE:LOOP-BODYelse:ELSE-BODY
we desugar to the equivalent of:
_iter=iter(ITERABLE)_iterclose=getattr(type(_iter),"__iterclose__",lambda:None)try:traditional-forVARin_iter:LOOP-BODYelse:ELSE-BODYfinally:_iterclose(_iter)
where the “traditional-for statement” here is meant as a shorthand forthe classic 3.5-and-earlierfor loop semantics.
Besides the top-levelfor statement, Python also contains severalother places where iterators are consumed. For consistency, theseshould call__iterclose__ as well using semantics equivalent tothe above. This includes:
for loops inside comprehensions* unpackinglist(it),tuple(it),itertools.product(it1,it2,...),and others.In addition, ayieldfrom that successfully exhausts the calledgenerator should as a last step call its__iterclose__method. (Rationale:yieldfrom already links the lifetime of thecalling generator to the called generator; if the calling generator isclosed when half-way through ayieldfrom, then this will alreadyautomatically close the called generator.)
We also make the analogous changes to async iteration constructs,except that the new slot is called__aiterclose__, and it’s anasync method that getsawaited.
Generator objects (including those created by generatorcomprehensions):
__iterclose__ callsself.close()__del__ callsself.close() (same as now), and additionallyissues aResourceWarning if the generator wasn’t exhausted. Thiswarning is hidden by default, but can be enabled for those who wantto make sure they aren’t inadvertently relying on CPython-specificGC semantics.Async generator objects (including those created by async generatorcomprehensions):
__aiterclose__ callsself.aclose()__del__ issues aRuntimeWarning ifaclose has not beencalled, since this probably indicates a latent bug, similar to the“coroutine never awaited” warning.QUESTION: should file objects implement__iterclose__ to close thefile? On the one hand this would make this change more disruptive; onthe other hand people really like writingforlineinopen(...):..., and if we get used to iterators taking care of their owncleanup then it might become very weird if files don’t.
Theoperator module gains two new functions, with semanticsequivalent to the following:
defiterclose(it):ifnotisinstance(it,collections.abc.Iterator):raiseTypeError("not an iterator")ifhasattr(type(it),"__iterclose__"):type(it).__iterclose__(it)asyncdefaiterclose(ait):ifnotisinstance(it,collections.abc.AsyncIterator):raiseTypeError("not an iterator")ifhasattr(type(ait),"__aiterclose__"):awaittype(ait).__aiterclose__(ait)
Theitertools module gains a new iterator wrapper that can be usedto selectively disable the new__iterclose__ behavior:
# QUESTION: I feel like there might be a better name for this one?classpreserve(iterable):def__init__(self,iterable):self._it=iter(iterable)def__iter__(self):returnselfdef__next__(self):returnnext(self._it)def__iterclose__(self):# Swallow __iterclose__ without passing it onpass
Example usage (assuming that file objects implements__iterclose__):
withopen(...)ashandle:# Iterate through the same file twice:forlineinitertools.preserve(handle):...handle.seek(0)forlineinitertools.preserve(handle):...
@contextlib.contextmanagerdefiterclosing(iterable):it=iter(iterable)try:yieldpreserve(it)finally:iterclose(it)
Python ships a number of iterator types that act as wrappers aroundother iterators:map,zip,itertools.accumulate,csv.reader, and others. These iterators should define a__iterclose__ method which calls__iterclose__ in turn ontheir underlying iterators. For example,map could be implementedas:
# Helper functionmap_chaining_exceptions(fn,items,last_exc=None):foriteminitems:try:fn(item)exceptBaseExceptionasnew_exc:ifnew_exc.__context__isNone:new_exc.__context__=last_exclast_exc=new_exciflast_excisnotNone:raiselast_excclassmap:def__init__(self,fn,*iterables):self._fn=fnself._iters=[iter(iterable)foriterableiniterables]def__iter__(self):returnselfdef__next__(self):returnself._fn(*[next(it)foritinself._iters])def__iterclose__(self):map_chaining_exceptions(operator.iterclose,self._iters)defchain(*iterables):try:whileiterables:forelementiniterables.pop(0):yieldelementexceptBaseExceptionase:defiterclose_iterable(iterable):operations.iterclose(iter(iterable))map_chaining_exceptions(iterclose_iterable,iterables,last_exc=e)
In some cases this requires some subtlety; for example,itertools.teeshould not call__iterclose__ on the underlying iterator until ithas been called onall of the clone iterators.
The payoff for all this is that we can now write straightforward codelike:
defread_newline_separated_json(path):forlineinopen(path):yieldjson.loads(line)
and be confident that the file will receive deterministic cleanupwithout the end-user having to take any special effort, even incomplex cases. For example, consider this silly pipeline:
list(map(lambdakey:key.upper(),doc["key"]fordocinread_newline_separated_json(path)))
If our file contains a document wheredoc["key"] turns out to bean integer, then the following sequence of events will happen:
key.upper() raises anAttributeError, which propagates outof themap and triggers the implicitfinally block insidelist.finally block inlist calls__iterclose__() on themap object.map.__iterclose__() calls__iterclose__() on the generatorcomprehension object.GeneratorExit exception into the generatorcomprehension body, which is currently suspended inside thecomprehension’sfor loop body.for loop, triggering thefor loop’s implicitfinally block, which calls__iterclose__ on the generator object representing the call toread_newline_separated_json.GeneratorExit exception into the body ofread_newline_separated_json, currently suspended at theyield.GeneratorExit propagates out of thefor loop,triggering thefor loop’s implicitfinally block, whichcalls__iterclose__() on the file object.GeneratorExit resumes propagating, hits the boundaryof the generator function, and causesread_newline_separated_json’s__iterclose__() method toreturn successfully.GeneratorExit continues propagating, allowing thecomprehension’s__iterclose__() to return successfully.__iterclose__() calls unwind without incident,back into the body oflist.AttributeError resumes propagating.(The details above assume that we implementfile.__iterclose__; ifnot then add awith block toread_newline_separated_json andessentially the same logic goes through.)
Of course, from the user’s point of view, this can be simplified downto just:
1.int.upper() raises anAttributeError1. The file object is closed.2. TheAttributeError propagates out oflist
So we’ve accomplished our goal of making this “just work” without theuser having to think about it.
While the majority of existingfor loops will continue to produceidentical results, the proposed changes will producebackwards-incompatible behavior in some cases. Example:
defread_csv_with_header(lines_iterable):lines_iterator=iter(lines_iterable)forlineinlines_iterator:column_names=line.strip().split("\t")breakforlineinlines_iterator:values=line.strip().split("\t")record=dict(zip(column_names,values))yieldrecord
This code used to be correct, but after this proposal is implementedwill require anitertools.preserve call added to the firstforloop.
[QUESTION: currently, if you close a generator and then try to iterateover it then it just raisesStop(Async)Iteration, so code thepasses the same generator object to multiplefor loops but forgetsto useitertools.preserve won’t see an obvious error – the secondfor loop will just exit immediately. Perhaps it would be better ifiterating a closed generator raised aRuntimeError? Note thatfiles don’t have this problem – attempting to iterate a closed fileobject already raisesValueError.]
Specifically, the incompatibility happens when all of these factorscome together:
__(a)iterclose__ is enabled__(a)iterclose____(a)iterclose__for loop exitsSo the problem is how to manage this transition, and those are thelevers we have to work with.
First, observe that the only async iterables where we propose to add__aiterclose__ are async generators, and there is currently noexisting code using async generators (though this will start changingvery soon), so the async changes do not produce any backwardsincompatibilities. (There is existing code using async iterators, butusing the new async for loop on an old async iterator is harmless,because old async iterators don’t have__aiterclose__.) Inaddition,PEP 525 was accepted on a provisional basis, and asyncgenerators are by far the biggest beneficiary of this PEP’s proposedchanges. Therefore, I think we should strongly consider enabling__aiterclose__ forasyncfor loops and async generators ASAP,ideally for 3.6.0 or 3.6.1.
For the non-async world, things are harder, but here’s a potentialtransition path:
In 3.7:
Our goal is that existing unsafe code will start emitting warnings,while those who want to opt-in to the future can do that immediately:
__iterclose__ methods describedabove.from__future__importiterclose is in effect, thenforloops and* unpacking call__iterclose__ as specified above.for loops and*unpacking donot call__iterclose__. But they do call someother method instead, e.g.__iterclose_warning__.list use stack introspection (!!) tocheck whether their direct caller has__future__.itercloseenabled, and use this to decide whether to call__iterclose__ or__iterclose_warning__.__iterclose_warning__methods that forward to the__iterclose_warning__ method of theunderlying iterator or iterators.__iterclose_warning__ is defined to set an internal flag, andother methods on the object are modified to check for this flag. Ifthey find the flag set, they issue aPendingDeprecationWarningto inform the user that in the future this sequence would have ledto a use-after-close situation and the user should usepreserve().In 3.8:
PendingDeprecationWarning toDeprecationWarningIn 3.9:
__future__ unconditionally and remove all the__iterclose_warning__ stuff.I believe that this satisfies the normal requirements for this kind oftransition – opt-in initially, with warnings targeted precisely tothe cases that will be effected, and a long deprecation cycle.
Probably the most controversial / risky part of this is the use ofstack introspection to make the iterable-consuming functions sensitiveto a__future__ setting, though I haven’t thought of any situationwhere it would actually go wrong yet…
Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz forhelpful discussion on earlier versions of this idea.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0533.rst
Last modified:2025-02-01 08:59:27 GMT