Movatterモバイル変換


[0]ホーム

URL:


rfc:generators

Request for Comments: Generators

  • Date: 2012-06-05
  • Author: Nikita Popovnikic@php.net
  • Status: Implemented

Introduction

Generators provide an easy, boilerplate-free way of implementing iterators.

As an example, consider how you would implement thefile() function in userland code:

function getLinesFromFile($fileName){if(!$fileHandle=fopen($fileName,'r')){return;} $lines=[];while(false!==$line=fgets($fileHandle)){$lines[]=$line;} fclose($fileHandle); return$lines;} $lines= getLinesFromFile($fileName);foreach($linesas$line){// do something with $line}

The main disadvantage of this kind of code is evident: It will read the whole file into a large array. Dependingon how big the file is, this can easily hit the memory limit. This is not what you usually want. Instead youwant to get the lines one by one. This is what iterators are perfect for.

Sadly implementing iterators requires an insane amount of boilerplate code. E.g. consider this iterator variantof the above function:

class LineIteratorimplements Iterator{protected$fileHandle; protected$line;protected$i; publicfunction __construct($fileName){if(!$this->fileHandle=fopen($fileName,'r')){thrownew RuntimeException('Couldn\'t open file "'.$fileName.'"');}} publicfunctionrewind(){fseek($this->fileHandle,0);$this->line=fgets($this->fileHandle);$this->i=0;} publicfunction valid(){returnfalse!==$this->line;} publicfunctioncurrent(){return$this->line;} publicfunctionkey(){return$this->i;} publicfunctionnext(){if(false!==$this->line){$this->line=fgets($this->fileHandle);$this->i++;}} publicfunction __destruct(){fclose($this->fileHandle);}} $lines=new LineIterator($fileName);foreach($linesas$line){// do something with $line}

As you can see a very simple piece of code can easily become very complicated when turned into aniterator. Generators solve this problem and allow you to implement iterators in a very straightforwardmanner:

function getLinesFromFile($fileName){if(!$fileHandle=fopen($fileName,'r')){return;} while(false!==$line=fgets($fileHandle)){yield$line;} fclose($fileHandle);} $lines= getLinesFromFile($fileName);foreach($linesas$line){// do something with $line}

The code looks very similar to the array-based implementation. The main difference is that instead of pushingvalues into an array the values areyielded.

Generators work by passing control back and forth between the generator and the calling code:

When you first call the generator function ($lines = getLinesFromFile($fileName)) the passed argument is bound,but nothing of the code is actually executed. Instead the function directly returns aGenerator object. ThatGenerator object implements theIterator interface and is what is eventually traversed by theforeachloop:

Whenever theIterator::next() method is called PHP resumes the execution of the generator function until ithits ayield expression. The value of thatyield expression is whatIterator::current() then returns.

Generator methods, together with theIteratorAggregate interface, can be used to easily implement traversableclasses too:

class Testimplements IteratorAggregate{protected$data; publicfunction __construct(array$data){$this->data=$data;} publicfunction getIterator(){foreach($this->dataas$key=>$value){yield$key=>$value;}// or whatever other traversation logic the class has}} $test=new Test(['foo'=>'bar','bar'=>'foo']);foreach($testas$k=>$v){echo$k,' => ',$v,"\n";}

Generators can also be used the other way around, i.e. instead of producing values they can also consume them. Whenused in this way they are often referred to as enhanced generators, reverse generators or coroutines.

Coroutines are a rather advanced concept, so it very hard to come up with not too contrived an short examples.For an introduction see an exampleon how to parse streaming XML using coroutines.If you want to know more, I highly recommend checking outa presentationon this subject.

Specification

Recognition of generator functions

Any function which contains ayield statement is automatically a generator function.

The initial implementation required that generator functions are marked with an asterix modifier (function*). This method hasthe advantage that generators are more explicit and also allows for yield-less coroutines.

The automatic detection was chosen over the asterix modifier for the following reasons:

  • There is an existing generator implementation in HipHop PHP, which uses automatic-detection. Using the asterix modifier would break compatibility.
  • All existing generator implementations in other language (that I know of) also use automatic detection. This includes Python, JavaScript 1.7 and C#. The only exception to this is the generator support as defined by ECMAScript Harmony, but I know no browser that actually implements it in the defined way.
  • The syntax for by-reference yielding looks very ugly:function *&gen()
  • yield-less coroutines are a very narrow use case and are also possible with automatic-detection using a code likeif (false) yield;.

Basic behavior

When a generator function is called the execution is suspended immediately after parameter binding and aGenerator objectis returned.

TheGenerator object implements the following interface:

finalclass Generatorimplements Iterator{    voidrewind();    bool  valid();    mixedcurrent();    mixedkey();    voidnext();     mixed send(mixed$value);    mixedthrow(Exception$exception);}

If the generator is not yet at ayield statement (i.e. was just created and not yet used as an iterator), then any call torewind,valid,current,key,next orsend will resume the generator until the nextyield statement ishit.

Consider this example:

function gen(){echo'start';yield'middle';echo'end';} // Initial call does not output anything$gen= gen(); // Call to current() resumes the generator, thus "start" is echo'd.// Then the yield expression is hit and the string "middle" is returned// as the result of current() and then echo'd.echo$gen->current(); // Execution of the generator is resumed again, thus echoing "end"$gen->next();

A nice side-effect of this behavior is that coroutines do not have to be primed with anext() call before they can be used. (This isrequired in Python and also the reason why coroutines in Python usually use some kind of decorator that automatically primes the coroutine.)

Apart from the above theGenerator methods behave as follows:

  • rewind: Throws an exception if the generator is currently after the first yield. (More in the “Rewinding a generator” section.)
  • valid: Returnsfalse if the generator has been closed,true otherwise. (More in the “Closing a generator” section.)
  • current: Returns whatever was passed toyield ornull if nothing was passed or the generator is already closed.
  • key: Returns the yielded key or, if none was specified, an auto-incrementing key ornull if the generator is already closed. (More in the “Yielding keys” section.)
  • next: Resumes the generator (unless the generator is already closed).
  • send: Sets the return value of theyield expression and resumes the generator (unless the generator is already closed). (More in the “Sending values” section.)
  • throw: Throws an exception at the current suspension point in the generator. (More in the “Throwing into the generator” section.)

Yield syntax

The newly introducedyield keyword (T_YIELD) is used both for sending and receiving values inside the generator. There are three basic forms of theyield expression:

  • yield $key => $value: Yields the value$value with key$key.
  • yield $value: Yields the value$value with an auto-incrementing integer key.
  • yield: Yields the valuenull with an auto-incrementing integer key.

The return value of theyield expression is whatever was sent to the generator usingsend(). If nothing was sent (e.g. duringforeach iteration)null is returned.

To avoid ambiguities the first twoyield expression types have to be surrounded by parenthesis when used in expression-context. Some examples when parentheses are necessary and when they aren't:

// these three are statements, so they don't need parenthesisyield$key=>$value;yield$value;yield; // these are expressions, so they require parenthesis$data=(yield$key=>$value);$data=(yield$value); // to avoid strange (yield) syntax the parenthesis are not required here$data=yield;

Ifyield is used inside a language construct that already has native parentheses, then they don't have to be duplicated:

call(yield$value);// instead ofcall((yield$value)); if(yield$value){...}// instead ofif((yield$value)){...}

The only exception is thearray() structure. Not requiring parenthesis would be ambiguous here:

array(yield$key=>$value)// can be eitherarray((yield$key)=>$value)// orarray((yield$key=>$value))

Python also has parentheses requirements for expression-use ofyield. The only difference is that Python also requires parentheses for a value-lessyield (because the language does not use semicolons).

See also the"Alternative yield syntax considerations" section.

Yielding keys

The languages that currently implement generators don't have support for yielding keys (only values). This though is just a side-effectas these languages don't support keys in iterators in general.

In PHP on the other hand keys are explicitly part of the iteration process and it thus does not make sense to not addkey-yielding support. The syntax could be analogous to that offoreach loops andarray declarations:

yield$key=>$value;

Furthermore generators need to generate keys even if no key was explicitly yielded. In this case it seems reasonable to behavethe same as arrays do: Start with the key0 and always increment by one. If in between an integer key which is larger than thecurrent auto-key is explicitly yielded, then that will be used as the starting point for new auto-keys. All other yielded keysdo not affect the auto-key mechanism.

function gen(){yield'a';yield'b';yield'key'=>'c';yield'd';yield10=>'e';yield'f';} foreach(gen()as$key=>$value){echo$key,' => ',$value,"\n";} // outputs:0=> a1=> bkey=> c2=> d10=> e11=> f

This is the same behavior that arrays have (i.e. ifgen() instead simply returned an array with the yielded values the keys wouldbe same). The only difference occurs when the generator yield non-integer, but numeric keys. For arrays they are cast, for generatorsthe are not.

Yield by reference

Generators can also yield by values by reference. To do so the& modifier is added before the function name, just like it is donefor return by reference.

This for example allows you to create classes with by-ref iteration behavior (which is something that is completely impossible withnormal iterators):

class DataContainerimplements IteratorAggregate{protected$data; publicfunction __construct(array$data){$this->data=$data;} publicfunction&getIterator(){foreach($this->dataas$key=>&$value){yield$key=>$value;}}}

The class can then be iterated using by-refforeach:

$dataContainer=new DataContainer([1,2,3]);foreach($dataContaineras&$value){$value*=-1;} // $this->data is now [-1, -2, -3]

Only generators specifying the& modifier can be iterated by ref. If you try to iterate a non-ref generator by-ref anE_ERROR is thrown.

Sending values

Values can be sent into a generator using thesend() method.send($value) will set$value as the return valueof the currentyield expression and resume the generator. When the generator hits anotheryield expression the yielded value will bethe return value ofsend(). This is just a convenience feature to save an additional call tocurrent().

Values are always sent by-value. The reference modifier& only affects yielded values, not the ones sent back to the coroutine.

A simple example of sending values: Two (interchangeable) logging implementations:

function echoLogger(){while(true){echo'Log: '.yield."\n";}} function fileLogger($fileName){$fileHandle=fopen($fileName,'a');while(true){fwrite($fileHandle,yield."\n");}} $logger= echoLogger();// or$logger= fileLogger(__DIR__.'/log'); $logger->send('Foo');$logger->send('Bar');

Throwing into the generator

Exceptions can be thrown into the generator using theGenerator::throw() method. This will throw an exception in the generator's executioncontext and then resume the generator. It is roughly equivalent to replacing the currentyield expression with athrow statement andresuming then. If the generator is already closed the exception will be thrown in the callers context instead (which is equivalent to replacingthethrow() call with athrow statement). Thethrow() method will return the next yielded value (if the exception is caught and noother exception is thrown).

An example of the functionality:

function gen(){echo"Foo\n";    try{yield;} catch(Exception$e){echo"Exception: {$e->getMessage()}\n";}echo"Bar\n";} $gen= gen();$gen->rewind();// echos "Foo"$gen->throw(new Exception('Test'));// echos "Exception: Test"// and "Bar"

Rewinding a generator

Rewinding to some degree goes against the concept of generators, as they are mainly intended as one-time data sources that are notsupposed to be iterated another time. On the other hand, most generators probably *are* rewindable and it might make sense to allowit. One could argue though that rewinding a generator is really bad practice (especially if the generator is doing some expensivecalculation). Allowing it to rewind would look like it is a cheap operation, just like with arrays. Also rewinding (as in jumpingback to the execution context state at the initial call to the generator) can lead to unexpected behavior, e.g. in the following case:

function getSomeStuff(PDOStatement$stmt){foreach($stmtas$row){yield doSomethingWith($row);}}

Here rewinding would simply result in an empty iterator as the result set is already depleted.

For the above reasons generators will not support rewinding. Therewind method will throw an exception, unless the generator is currently before or at the first yield. This results in the following behavior:

$gen= createSomeGenerator(); // the rewind() call foreach is doing here is okay, because// the generator is before the first yieldforeach($genas$val){...} // the rewind() call of a second foreach loop on the other hand// throws an exceptionforeach($genas$val){...}

So basically callingrewind is only allowed if it wouldn't do anything (because the generator is already at its initial state). After that an exception is thrown, so accidentally reused generators are easy to find.

Cloning a generator

Generators cannot be cloned.

Support for cloning was included in the initial version, but removed in PHP 5.5 Beta 3 due to implementational difficulties, unclear semantics and no particularly convincing use cases.

Closing a generator

When a generator is closed it frees the suspended execution context (as well as all other held variables). After it has been closedvalid will returnfalse and bothcurrent andkey will returnnull.

A generator can be closed in two ways:

  • Reaching areturn statement (or the end of the function) in a generator or throwing an exception from it (without catching it inside the generator).
  • Removing all references to the generator object. In this case the generator will be closed as part of the garbage collection process.

If the generator contains (relevant)finally blocks those will be run. If the generator is force-closed (i.e. by removing all references) then it is notallowed to useyield in thefinally clause (a fatal error will be thrown). In all other casesyield is allowed infinally blocks.

The following resources are destructed while closing a generator:

  • The current execution context (execute_data)
  • Stack arguments for the generator call, and the additional execution context which is used to manage them.
  • The currently active symbol table (or the compiled variables if no symbol table is in use).
  • The current$this object.
  • If the generator is closed during a method call, the object which the method is invoked on (EX(object)).
  • If the generator is closed during a call, the arguments pushed to the stack.
  • Anyforeach loop variables which are still alive (taken frombrk_cont_array).
  • The current generator key and value

Currently it can happen that temporary variables are not cleaned up properly in edge-case situations. Exceptions are also subject tothis problem:https://bugs.php.net/bug.php?id=62210. If that bug could be fixed for exceptions, then it would also be fixed for generators.

Error conditions

This is a list of generators-related error conditions:

  • Usingyield outside a function:E_COMPILE_ERROR
  • Usingreturn with a value inside a generator:E_COMPILE_ERROR
  • Manual construction ofGenerator class:E_RECOVERABLE_ERROR (analogous toClosure behavior)
  • Yielding a key that isn't an integer or a key:E_ERROR (this is just a placeholder until Etienne's arbitrary-keys patch lands)
  • Trying to iterate a non-ref generator by-ref:Exception
  • Trying to traverse an already closed generator:Exception
  • Trying to rewind a generator after the first yield:Exception
  • Yielding a temp/const value by-ref:E_NOTICE (analogous toreturn behavior)
  • Yielding a string offset by-ref:E_ERROR (analogous toreturn behavior)
  • Yielding a by-val function return value by-ref:E_NOTICE (analogous toreturn behavior)

This list might not be exhaustive.

Performance

You can find a small micro benchmark athttps://gist.github.com/2975796. It compares several ways of iterating ranges:

  • Using generators (xrange)
  • Using iterators (RangeIterator)
  • Using arrays implemented in userland (urange)
  • Using arrays implemented internally (range)

For large ranges generators are consistently faster; about four times faster than an iterator implementation and even 40% faster than the nativerange implementation.

For small ranges (around one hundred elements) the variance of the results is rather high, but from multiple runs it seems that in this case generators are slightly slowerthan the native implementation, but still faster than the iterator variant.

The tests were run on a Ubuntu VM, so I'm not exactly sure how representative they are.

Some points from the discussion

Why not just use callback functions?

A question that has come up a few times during discussion: Why not use callback functions, instead of generators? For example the abovegetLinesFromFile function couldbe rewritten using a callback:

function processLinesFromFile($fileName, callable$callback){if(!$fileHandle=fopen($fileName,'r')){return;} while(false!==$line=fgets($fileHandle)){$callback($line);} fclose($fileHandle);} processLinesFromFile($fileName,function($line){// do something});

This approach has two main disadvantages:

Firstly, callbacks integrate badly into the existing PHP coding paradigms. Having quadruply-nested closures is something very normal in languages like JavaScript, butrather rare in PHP. Many things in PHP are based on iteration and generators can nicely integrate with this.

A concrete example, which was actually my initial motivation to write the generators patch:

protectedfunction getTests($directory,$fileExtension){$it=new RecursiveDirectoryIterator($directory);$it=new RecursiveIteratorIterator($it, RecursiveIteratorIterator::LEAVES_ONLY);$it=new RegexIterator($it,'(\.'.preg_quote($fileExtension).'$)'); $tests=array();foreach($itas$file){// read file$fileContents=file_get_contents($file); // parse sections$parts=array_map('trim',explode('-----',$fileContents)); // first part is the name$name=array_shift($parts); // multiple sections possible with always two forming a pairforeach(array_chunk($parts,2)as$chunk){$tests[]=array($name,$chunk[0],$chunk[1]);}} return$tests;}

This is a function which I use to provide test vectors to PHPUnit. I point it to a directory containing test files and then split up those test files into individual tests+ expected output. I can then use the result of the function to feed some test function via@dataProvider.

The problem with the above implementation obviously is that I have to read all tests into memory at once (instead of one-by-one).

How can I solve this problem? By turning it into an iterator obviously! But if you look closer, this isn't actually that easy, because I'm adding new tests in a nested loop.So I would have to implement some kind of complex push-back mechanism to solve the problem. And - getting back on topic - I can't use callbacks here either, because I needa traversable for use with@dataProvider. Generators on the other hand solve this problem very elegantly. Actually, all you have to do to turn it into a lazy generatoris replace$tests[] = withyield.

The second, more general problem with callbacks is that it's very hard to manage state across calls. The classic example is a lexer + parser system. If you implement thelexer using a callback (i.e.lex(string $sourceCode, callable $tokenConsumer)) you would have to figure out some way to keep state between subsequent calls. You'd haveto build some kind of state machine, which can quickly get really ugly, even for simple problems (just look at the hundreds of states that a typical LALR parser has). Again,generators solve this problem elegantly, because they maintain state implicitly, in the execution state.

Alternative yield syntax considerations

Andrew proposed to use a function-like syntax foryield instead of the keyword notation. The threeyield variants would then look as follows:

  • yield()
  • yield($value)
  • yield($key => $value)

The main advantage of this syntax is that it would avoid the strange parentheses requirements for theyield $value syntax.

One of the main issues with the pseudo-function syntax is that it makes the semantics ofyield less clear. Currently theyield syntax looks very similar to thereturnsyntax. Both are very similar in a function, so it is desirable to keep them similar in syntax too.

Generally PHP uses thekeyword $expr syntax instead of thekeyword($expr) syntax in all places where the statement-use is more common than the expression-use. E.g.include $file; is usually used as a statement and only very rarely as an expression.isset($var) on the other hand is normally used as an expression (a statement usewouldn't make any sense, actually).

Asyield will be used as a statement in the vast majority of cases theyield $expr syntax thus seems more appropriate. Furthermore the most common expression-use ofyield is value-less, in which case the parentheses requirements don't apply (i.e. you can write just$data = yield;).

So the function-likeyield($value) syntax would optimize a very rare use case (namely$recv = yield($send);), at the same time making the common use cases less clear.

Patch

The current implementation can be found in this branch:https://github.com/nikic/php-src/tree/addGeneratorsSupport.

I also created a PR so that the diff can be viewed more easily:https://github.com/php/php-src/pull/177

Vote

Should generators be merged into master?
Real nameYesNo
aharvey (aharvey) 
alan_k (alan_k) 
cataphract (cataphract) 
felipe (felipe) 
googleguy (googleguy) 
hradtke (hradtke) 
iliaa (iliaa) 
ircmaxell (ircmaxell) 
jpauli (jpauli) 
juliens (juliens) 
laruence (laruence) 
lbarnaud (lbarnaud) 
levim (levim) 
lstrojny (lstrojny) 
mariuz (mariuz) 
mfonda (mfonda) 
mj (mj) 
nikic (nikic) 
patrickallaert (patrickallaert) 
peehaa (peehaa) 
pollita (pollita) 
sebastian (sebastian) 
seld (seld) 
stas (stas) 
tyrael (tyrael) 
Final result:241
This poll has been closed.

Further resources

Implementation in Python:

Implementation in #"http://wiki.ecmascript.org/doku.php?id=harmony:generators" title="http://wiki.ecmascript.org/doku.php?id=harmony:generators" rel="ugc nofollow">Generators in ECMAScript Harmony

  • Implementation in C#:

    Extensive introductions into the topic:

    rfc/generators.txt · Last modified: by127.0.0.1

    

    Table of Contents


    [8]ページ先頭

    ©2009-2025 Movatter.jp