Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 255 – Simple Generators

Author:
Neil Schemenauer <nas at arctrix.com>,Tim Peters <tim.peters at gmail.com>,Magnus Lie Hetland <magnus at hetland.org>
Status:
Final
Type:
Standards Track
Requires:
234
Created:
18-May-2001
Python-Version:
2.2
Post-History:
14-Jun-2001, 23-Jun-2001

Table of Contents

Abstract

This PEP introduces the concept of generators to Python, as well as a newstatement used in conjunction with them, theyield statement.

Motivation

When a producer function has a hard enough job that it requires maintainingstate between values produced, most programming languages offer no pleasant andefficient solution beyond adding a callback function to the producer’s argumentlist, to be called with each value produced.

For example,tokenize.py in the standard library takes this approach: thecaller must pass atokeneater function totokenize(), called whenevertokenize() finds the next token. This allows tokenize to be coded in anatural way, but programs calling tokenize are typically convoluted by the needto remember between callbacks which token(s) were seen last. Thetokeneaterfunction intabnanny.py is a good example of that, maintaining a statemachine in global variables, to remember across callbacks what it has alreadyseen and what it hopes to see next. This was difficult to get workingcorrectly, and is still difficult for people to understand. Unfortunately,that’s typical of this approach.

An alternative would have been for tokenize to produce an entire parse of thePython program at once, in a large list. Then tokenize clients could bewritten in a natural way, using local variables and local control flow (such asloops and nested if statements) to keep track of their state. But this isn’tpractical: programs can be very large, so no a priori bound can be placed onthe memory needed to materialize the whole parse; and some tokenize clientsonly want to see whether something specific appears early in the program (e.g.,a future statement, or, as is done in IDLE, just the first indented statement),and then parsing the whole program first is a severe waste of time.

Another alternative would be to make tokenize aniterator,delivering thenext token whenever its.next() method is invoked. This is pleasant for thecaller in the same way a large list of results would be, but without the memoryand “what if I want to get out early?” drawbacks. However, this shifts theburden on tokenize to rememberits state between.next() invocations, andthe reader need only glance attokenize.tokenize_loop() to realize what ahorrid chore that would be. Or picture a recursive algorithm for producing thenodes of a general tree structure: to cast that into an iterator frameworkrequires removing the recursion manually and maintaining the state of thetraversal by hand.

A fourth option is to run the producer and consumer in separate threads. Thisallows both to maintain their states in natural ways, and so is pleasant forboth. Indeed, Demo/threads/Generator.py in the Python source distributionprovides a usable synchronized-communication class for doing that in a generalway. This doesn’t work on platforms without threads, though, and is very slowon platforms that do (compared to what is achievable without threads).

A final option is to use the Stackless[1] (PEP 219) variant implementation of Pythoninstead, which supports lightweight coroutines. This has much the sameprogrammatic benefits as the thread option, but is much more efficient.However, Stackless is a controversial rethinking of the Python core, and it maynot be possible for Jython to implement the same semantics. This PEP isn’t theplace to debate that, so suffice it to say here that generators provide auseful subset of Stackless functionality in a way that fits easily into thecurrent CPython implementation, and is believed to be relativelystraightforward for other Python implementations.

That exhausts the current alternatives. Some other high-level languagesprovide pleasant solutions, notably iterators in Sather[2], which wereinspired by iterators in CLU; and generators in Icon[3], a novel languagewhere every expressionis a generator. There are differences among these,but the basic idea is the same: provide a kind of function that can return anintermediate result (“the next value”) to its caller, but maintaining thefunction’s local state so that the function can be resumed again right where itleft off. A very simple example:

deffib():a,b=0,1while1:yieldba,b=b,a+b

Whenfib() is first invoked, it setsa to 0 andb to 1, then yieldsbback to its caller. The caller sees 1. Whenfib is resumed, from itspoint of view theyield statement is really the same as, say, aprintstatement:fib continues after the yield with all local state intact.aandb then become 1 and 1, andfib loops back to theyield, yielding1 to its invoker. And so on. Fromfib’s point of view it’s justdelivering a sequence of results, as if via callback. But from its caller’spoint of view, thefib invocation is an iterable object that can be resumedat will. As in the thread approach, this allows both sides to be coded in themost natural ways; but unlike the thread approach, this can be done efficientlyand on all platforms. Indeed, resuming a generator should be no more expensivethan a function call.

The same kind of approach applies to many producer/consumer functions. Forexample,tokenize.py could yield the next token instead of invoking acallback function with it as argument, and tokenize clients could iterate overthe tokens in a natural way: a Python generator is a kind of Pythoniterator, but of an especially powerful kind.

Specification: Yield

A new statement is introduced:

yield_stmt:"yield"expression_list

yield is a new keyword, so afuture statement (PEP 236) is needed to phasethis in: in the initial release, a module desiring to use generators mustinclude the line:

from__future__importgenerators

near the top (seePEP 236) for details). Modules using the identifieryield without afuture statement will trigger warnings. In thefollowing release,yield will be a language keyword and thefuturestatement will no longer be needed.

Theyield statement may only be used inside functions. A function thatcontains ayield statement is called a generator function. A generatorfunction is an ordinary function object in all respects, but has the newCO_GENERATOR flag set in the code object’s co_flags member.

When a generator function is called, the actual arguments are bound tofunction-local formal argument names in the usual way, but no code in the bodyof the function is executed. Instead a generator-iterator object is returned;this conforms to theiterator protocol, so in particular can be used infor-loops in a natural way. Note that when the intent is clear from context,the unqualified name “generator” may be used to refer either to agenerator-function or a generator-iterator.

Each time the.next() method of a generator-iterator is invoked, the codein the body of the generator-function is executed until ayield orreturn statement (see below) is encountered, or until the end of the bodyis reached.

If ayield statement is encountered, the state of the function is frozen,and the value ofexpression_list is returned to.next()’s caller. By“frozen” we mean that all local state is retained, including the currentbindings of local variables, the instruction pointer, and the internalevaluation stack: enough information is saved so that the next time.next() is invoked, the function can proceed exactly as if theyieldstatement were just another external call.

Restriction: Ayield statement is not allowed in thetry clause of atry/finally construct. The difficulty is that there’s no guarantee thegenerator will ever be resumed, hence no guarantee that the finally block willever get executed; that’s too much a violation of finally’s purpose to bear.

Restriction: A generator cannot be resumed while it is actively running:

>>>defg():...i=me.next()...yieldi>>>me=g()>>>me.next()Traceback (most recent call last):... File "<string>", line 2, in gValueError:generator already executing

Specification: Return

A generator function can also contain return statements of the form:

return

Note that anexpression_list is not allowed on return statements in the bodyof a generator (although, of course, they may appear in the bodies ofnon-generator functions nested within the generator).

When a return statement is encountered, control proceeds as in any functionreturn, executing the appropriatefinally clauses (if any exist). Then aStopIteration exception is raised, signalling that the iterator isexhausted. AStopIteration exception is also raised if control flows offthe end of the generator without an explicit return.

Note that return means “I’m done, and have nothing interesting to return”, forboth generator functions and non-generator functions.

Note that return isn’t always equivalent to raisingStopIteration: thedifference lies in how enclosingtry/except constructs are treated. Forexample,:

>>>deff1():...try:...return...except:...yield1>>>printlist(f1())[]

because, as in any function,return simply exits, but:

>>>deff2():...try:...raiseStopIteration...except:...yield42>>>printlist(f2())[42]

becauseStopIteration is captured by a bareexcept, as is anyexception.

Specification: Generators and Exception Propagation

If an unhandled exception– including, but not limited to,StopIteration–is raised by, or passes through, a generator function, then the exception ispassed on to the caller in the usual way, and subsequent attempts to resume thegenerator function raiseStopIteration. In other words, an unhandledexception terminates a generator’s useful life.

Example (not idiomatic but to illustrate the point):

>>>deff():...return1/0>>>defg():...yieldf()# the zero division exception propagates...yield42# and we'll never get here>>>k=g()>>>k.next()Traceback (most recent call last):  File"<stdin>", line1, in?  File"<stdin>", line2, ing  File"<stdin>", line2, infZeroDivisionError:integer division or modulo by zero>>>k.next()# and the generator cannot be resumedTraceback (most recent call last):  File"<stdin>", line1, in?StopIteration>>>

Specification: Try/Except/Finally

As noted earlier,yield is not allowed in thetry clause of atry/finally construct. A consequence is that generators should allocatecritical resources with great care. There is no restriction onyieldotherwise appearing infinally clauses,except clauses, or in thetry clause of atry/except construct:

>>>deff():...try:...yield1...try:...yield2...1/0...yield3# never get here...exceptZeroDivisionError:...yield4...yield5...raise...except:...yield6...yield7# the "raise" above stops this...except:...yield8...yield9...try:...x=12...finally:...yield10...yield11>>>printlist(f())[1, 2, 4, 5, 8, 9, 10, 11]>>>

Example

# A binary tree class.classTree:def__init__(self,label,left=None,right=None):self.label=labelself.left=leftself.right=rightdef__repr__(self,level=0,indent="    "):s=level*indent+`self.label`ifself.left:s=s+"\n"+self.left.__repr__(level+1,indent)ifself.right:s=s+"\n"+self.right.__repr__(level+1,indent)returnsdef__iter__(self):returninorder(self)# Create a Tree from a list.deftree(list):n=len(list)ifn==0:return[]i=n/2returnTree(list[i],tree(list[:i]),tree(list[i+1:]))# A recursive generator that generates Tree labels in in-order.definorder(t):ift:forxininorder(t.left):yieldxyieldt.labelforxininorder(t.right):yieldx# Show it off: create a tree.t=tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")# Print the nodes of the tree in in-order.forxint:printx,print# A non-recursive generator.definorder(node):stack=[]whilenode:whilenode.left:stack.append(node)node=node.leftyieldnode.labelwhilenotnode.right:try:node=stack.pop()exceptIndexError:returnyieldnode.labelnode=node.right# Exercise the non-recursive generator.forxint:printx,print

Both output blocks display:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

Q & A

Why not a new keyword instead of reusingdef?

See BDFL Pronouncements section below.

Why a new keyword foryield? Why not a builtin function instead?

Control flow is much better expressed via keyword in Python, and yield is acontrol construct. It’s also believed that efficient implementation in Jythonrequires that the compiler be able to determine potential suspension points atcompile-time, and a new keyword makes that easy. The CPython referenceimplementation also exploits it heavily, to detect which functionsaregenerator-functions (although a new keyword in place ofdef would solvethat for CPython – but people asking the “why a new keyword?” question don’twant any new keyword).

Then why not some other special syntax without a new keyword?

For example, one of these instead ofyield3:

return3andcontinuereturnandcontinue3returngenerating3continuereturn3return>>,3fromgeneratorreturn3return>>3return<<3>>3<<3*3

Did I miss one <wink>? Out of hundreds of messages, I counted threesuggesting such an alternative, and extracted the above from them. It would benice not to need a new keyword, but nicer to makeyield very clear – Idon’t want to have todeduce that a yield is occurring from making sense of apreviously senseless sequence of keywords or operators. Still, if thisattracts enough interest, proponents should settle on a single consensussuggestion, and Guido will Pronounce on it.

Why allowreturn at all? Why not force termination to be spelledraiseStopIteration?

The mechanics ofStopIteration are low-level details, much like themechanics ofIndexError in Python 2.1: the implementation needs to dosomething well-defined under the covers, and Python exposes these mechanismsfor advanced users. That’s not an argument for forcing everyone to work atthat level, though.return means “I’m done” in any kind of function, andthat’s easy to explain and to use. Note thatreturn isn’t always equivalenttoraiseStopIteration in try/except construct, either (see the“Specification: Return” section).

Then why not allow an expression onreturn too?

Perhaps we will someday. In Icon,returnexpr means both “I’m done”, and“but I have one final useful value to return too, and this is it”. At thestart, and in the absence of compelling uses forreturnexpr, it’s simplycleaner to useyield exclusively for delivering values.

BDFL Pronouncements

Issue

Introduce another new keyword (say,gen orgenerator) in placeofdef, or otherwise alter the syntax, to distinguish generator-functionsfrom non-generator functions.

Con

In practice (how you think about them), generatorsare functions, butwith the twist that they’re resumable. The mechanics of how they’re set upis a comparatively minor technical issue, and introducing a new keyword wouldunhelpfully overemphasize the mechanics of how generators get started (a vitalbut tiny part of a generator’s life).

Pro

In reality (how you think about them), generator-functions are actuallyfactory functions that produce generator-iterators as if by magic. In thisrespect they’re radically different from non-generator functions, acting morelike a constructor than a function, so reusingdef is at best confusing.Ayield statement buried in the body is not enough warning that thesemantics are so different.

BDFL

def it stays. No argument on either side is totally convincing, so Ihave consulted my language designer’s intuition. It tells me that the syntaxproposed in the PEP is exactly right - not too hot, not too cold. But, likethe Oracle at Delphi in Greek mythology, it doesn’t tell me why, so I don’thave a rebuttal for the arguments against the PEP syntax. The best I can comeup with (apart from agreeing with the rebuttals … already made) is “FUD”.If this had been part of the language from day one, I very much doubt it wouldhave made Andrew Kuchling’s “Python Warts” page.

Reference Implementation

The current implementation, in a preliminary state (no docs, but well testedand solid), is part of Python’s CVS development tree[5]. Using this requiresthat you build Python from source.

This was derived from an earlier patch by Neil Schemenauer[4].

Footnotes and References

[1]
http://www.stackless.com/
[2]
“Iteration Abstraction in Sather”Murer, Omohundro, Stoutamire and Szyperskihttp://www.icsi.berkeley.edu/~sather/Publications/toplas.html
[3]
http://www.cs.arizona.edu/icon/
[4]
http://python.ca/nas/python/generator.diff
[5]
To experiment with this implementation, check out Python from CVSaccording to the instructions athttp://sf.net/cvs/?group_id=5470Note that the std testLib/test/test_generators.py contains manyexamples, including all those in this PEP.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0255.rst

Last modified:2025-01-31 10:51:19 GMT


[8]ページ先頭

©2009-2025 Movatter.jp