It's Friday again and time for another Friday Q&A. This week I'm going to discuss a framework for creating generators in Objective-C. I'm indulging myself a bit with this, because nobody suggested it, but nothing in my suggestion bank inspired me this time around, so I decided to go with something of my own.
Generators
Agenerator is essentially a function which remembers its state between invocations. In a normal function, the second time you call it is just like the first. Execution starts over at the top, local variables reset their values, etc. With a generator, execution starts where it left off, and local variables remember their values.
In Python, a simple generator looks like this:
defcountfrom(n):whileTrue:yieldnn+=1
yield statement takes the place of thereturn statement that you'd find in a normal function. When execution hitsyield, the value is returned but the function's state is also saved. The net result is that successive calls return successively increasing numbers.Things aren't actually quite this simple in Python. A call tocountfrom doesn't start returning numbers, but rather returns an object. Calling thenext method on the object it returns will return the successive numbers.
Using myMAGenerator, things are much the same. The generator is considerably more wordy and uglier, but this should be so surprise considering that it's C, and that this is basically hacked on support rather than a language feature. The example counter generator usingMAGenerator would look like this:
GENERATOR(int,CountFrom(intstart),(void)){__blockintn;GENERATOR_BEGIN(void){n=start;while(1){GENERATOR_YIELD(n);n++;}}GENERATOR_END}
CountFrom doesn't start returning numbers. Instead, it returns a block. Calling the block will then start returning numbers. Here's an example of how you'd actually use this generator:int(^counter)(void)=CountFrom(42);for(inti=0;i<10;i++)NSLog(@"%d",counter());
Basic Implementation
The basic concept behind howMAGenerator works was inspired bySimon Tatham's Coroutines in C. His coroutines are considerably more limited due to the fact that C without blocks can't really handle what's needed, but there's a great deal of cleverness in the implementation.
There are two major problems that need to be solved. The first problem is that of resuming execution where it left off. This is solved using a switch statement. The individualcase labels give places to resume execution. In C,case labels can be put practically anywhere, even inside other control constructs, and the jump can be made and everything continues without any fuss.
The second problem is that of preserving state. Tatham solved this one by simply usingstatic variables instead of locals. While this works, it has the severe limitation of only allowing one instance of the function to be active. Using blocks, we can work around this much more nicely by using variables captured from the enclosing scope. Thus, here is the basic idea of what ourCountFrom function would look like if we implemented it directly instead of with the help of macros:
// this crazy thing is how you declare a function that returns a blockint(^CountFrom(intstart))(void){__blockintstate=0;__blockintn;int(^block)(void)=^{switch(state){case0:n=start;while(1){state=1;returnn;case1:;// yes, this is legal!n++;}}};return[[blockcopy]autorelease];}
CountFrom returns a block, which captures thestart parameter as well as thestate andn local variables. Those are qualified with__block so that they can be mutated from within the block.Thestate variable is where the magic happens. When you call the block, execution always begins at the top, just like with a function. However, the first thing it encounters is a switch statement. By setting the state variable to correspond to the location of the return statement, this switch statement causes execution to resume immediately after it. Since all of our variables are captured from the enclosing scope, they remember their contents across calls.
Cleanup
Let's consider a little string builder generator. This generator will take path components and build a complete path out of them. Here's a first stab at it:
NSString*(^PathBuilder(void))(NSString*component){__blockintstate=0;__blockNSString*path;NSString*(^block)(NSString*component)=^{switch(state){case0:path=@"/";while(1){path=[pathstringByAppendingPathComponent:component];state=1;returnpath;case1:;}}};return[[blockcopy]autorelease];}
__block variables don't retain what they point to. Ourpath variable points to an un-owned string. If the caller happens to pop an autorelease pool in between calls to the path builder generator, that will destroy the string, leave a dangling pointer inpath, and cause a crash at the next call.We could fix this by copying the string, like so:
NSString*(^PathBuilder(void))(NSString*component){__blockintstate=0;__blockNSString*path;__blockNSString*newPath;NSString*(^block)(NSString*component)=^{switch(state){case0:path=@"/";while(1){newPath=[pathstringByAppendingPathComponent:component];[pathrelease];path=[newPathcopy];state=1;returnpath;case1:;}}};return[[blockcopy]autorelease];}
path.The proper solution is to create a cleanup block which gets executed when the main block is destroyed. This can be accomplished by taking advantage of the fact that blocksdo automatically manage the memory of captured non-__block object pointer variables. We can create an object:
NSString*(^PathBuilder(void))(NSString*component){idcleanupObj=...;
...NSString*(^block)(NSString*component)=^{[cleanupObjpath];...
[cleanupObjcallBlockWhenDeallocated:^{...}];
CFMutableArray with custom callbacks set up so that the retain callback would copy the object, and the release callback would cast it to the appropriate block pointer, call it, and then release it. By adding the cleanup block to the array, it will be automatically invoked when the array (and the generator block) is destroyed.If you assume thatMAGeneratorMakeCleanupArray does the job of setting up the appropriateCFMutableArray (and casting it to anNSMutableArray then the final code, with cleanup, looks like this:
NSString*(^PathBuilder(void))(NSString*component){__blockintstate=0;__blockNSString*path;__blockNSString*newPath;NSMutableArray*cleanupArray=MAGeneratorMakeCleanupArray();NSString*(^block)(NSString*component)=^{[cleanupArrayself];// meaningless reference to capture itswitch(state){case0:path=@"/";while(1){newPath=[pathstringByAppendingPathComponent:component];[pathrelease];path=[newPathcopy];state=1;returnpath;case1:;}}};[cleanupArrayaddObject:^{[pathrelease];}];return[[blockcopy]autorelease];}
Macroization
Much of the above, while workable, is tremendously annoying to write. To help, I turned all the boilerplate parts into macros.
To start with, theGENERATOR macro takes care of the function declaration as well as a variable to hold the cleanup block. It also declares aGENERATOR_zeroReturnValue variable, which is used to quiet compiler warnings about failing to return a value at the end of the block. Unfortunately, this trick means you can't use these macros to create a generator which returnsvoid. However, since you can just declare another type (likeint) and ignore the value in the caller, this is not a big problem. Here's what it looks like:
#defineGENERATOR(returnType,nameAndCreationParams,perCallParams) \returnType(^nameAndCreationParams)perCallParams \{ \returnTypeGENERATOR_zeroReturnValue; \bzero(&GENERATOR;_zeroReturnValue,sizeof(GENERATOR_zeroReturnValue)); \returnType(^GENERATOR_cleanupBlock)(void)=nil;
GENERATOR.TheGENERATOR_DECL works the same as theGENERATOR macro, but only produces the prototype, making it suitable for a declaration in a header file.
#defineGENERATOR_DECL(returnType,nameAndCreationParams,perCallParams) \returnType(^nameAndCreationParams)perCallParams
GENERATOR_BEGIN, which takes the generator's parameters again. This declares the state variable (calledGENERATOR_where, the cleanup array, and starts off the block:#defineGENERATOR_BEGIN(...) \__blockintGENERATOR_where=-1; \NSMutableArray*GENERATOR_cleanupArray=MAGeneratorMakeCleanupArray(); \idGENERATOR_mainBlock=^(__VA_ARGS__){ \[GENERATOR_cleanupArrayself]; \switch(GENERATOR_where) \{ \case-1:
GENERATOR_YIELD macro. This macro does exactly the same thing as the sequence we wrote manually above. The one trick here is that we need a unique number to identify the state value for this yield. Previously we could just write 1, 2, 3, 4, etc. in the code. We could make this macro take another parameter for the state value, but it's annoying to have to keep track of them all. Instead, as Tatham did, I chose to use the__LINE__ macro, which gets replaced by the line number of the current line of code.#defineGENERATOR_YIELD(...) \do{ \GENERATOR_where=__LINE__; \return__VA_ARGS__; \case__LINE__:; \}while(0)
GENERATOR_YIELDs on the same line.Next we want the ability to define a cleanup block, which we'll do with aGENERATOR_CLEANUP macro. This macro gets a little funky, because the macros are designed to work whetherGENERATOR_CLEANUP is present or not.GENERATOR_END, which will end the definition of the generator, needs to work properly whether it followsGENERATOR_CLEANUP orGENERATOR_BEGIN. This is done in this macro by adding a superfluous set of curly braces to the cleanup block:
#defineGENERATOR_CLEANUP \} \GENERATOR_where=-1; \returnGENERATOR_zeroReturnValue; \}; \GENERATOR_cleanupBlock=^{{
GENERATOR_CLEANUP is present or not. Thus, it has the same ending with the return statement, which works in both contexts. Finally it checks to see if a cleanup block has been set, adds it to the cleanup array if so, and then returns the newly created generator block.#defineGENERATOR_END \} \GENERATOR_where=-1; \returnGENERATOR_zeroReturnValue; \}; \if(GENERATOR_cleanupBlock) \[GENERATOR_cleanupArrayaddObject:^{GENERATOR_cleanupBlock();}]; \return[[GENERATOR_mainBlockcopy]autorelease]; \}
return GENERATOR_zeroReturnValue; statement in the cleanup block, the cleanup block needs to have the same return type as the generator block. Since the array has no way to figure out the proper block type to invoke, we wrap the cleanup block in a nicevoid/void block that it can deal with.Results
These macros are pretty scary, but the results are quite nice. This is the same example that I showed at the beginning:
GENERATOR(int,CountFrom(intstart),(void)){__blockintn;GENERATOR_BEGIN(void){n=start;while(1){GENERATOR_YIELD(n);n++;}}GENERATOR_END}
PathBuilder generator that I used as an example above without the macros, put in macro form:GENERATOR(NSString*,PathBuilder(void),(NSString*component)){__blockNSString*path;__blockNSString*newPath;GENERATOR_BEGIN(NSString*component){path=@"/";while(1){newPath=[pathstringByAppendingPathComponent:component];[pathrelease];path=[newPathcopy];GENERATOR_YIELD(path);}}GENERATOR_CLEANUP{[pathrelease];}GENERATOR_END}
Caveats
As with all blasphemous crimes against nature, MAGenerator has a few caveats.
GENERATOR_BEGIN. Local variables declared within the generator block will not remember their state between invocation. You can get away with this with carefully structured code, but it's best not to try.for/in loop. You cannot safely usefor/in loops (other loop constructs are fine) unless the body contains no invocations ofGENERATOR_YIELD.GENERATOR_YIELD on the same line. This is not a big hardship: just separate them with a carriage return. Thankfully, this one will generate a compiler error, not a difficult runtime error.switch statements used within the generator must not contain any invocations ofGENERATOR_YIELD. Because the generator's execution state is resumed using aswitch statement, and becauseGENERATOR_YIELD createscase labels, usingGENERATOR_YIELD inside your ownswitch statement will cause confusion, because the generated case labels will belong to the innerswitch rather than the macro-generated execution dispatch one.GENERATOR_YIELD. You can't assume that the caller will keep an autorelease pool around for you until you're done.GENERATOR_CLEANUP takes care of this, but it's still harder than it is in normal code.Potential Uses
I'm fairly new to the world of generators, and of course everybody is new to the world of generators in Objective-C, but I have some ideas of how they could be productively used.
Lazy Evaluation
A major use of generations in Python is to lazily evaluate enumerated values. The generator creates new values on demand rather than requiring them all to be precomputed. MAGenerator includes a convenience function,MAGeneratorEnumerator, which takes a generator with no per-call parameters and anid return type, and returns an object conforming toNSFastEnumeration which enumerates by successively calling the generator.
As an example, here's a generator which will find files with a certain extension in a certain path:
GENERATOR(id,FileFinder(NSString*path,NSString*extension),(void)){NSDirectoryEnumerator*enumerator=[[NSFileManagerdefaultManager]enumeratorAtPath:path];__blockNSString*subpath;GENERATOR_BEGIN(void){while((subpath=[enumeratornextObject])){if([[subpathpathExtension]isEqualToString:extension])GENERATOR_YIELD((id)[pathstringByAppendingPathComponent:subpath]);}}GENERATOR_END}
NSEnumerator subclass which wraps theNSDirectoryEnumerator, but that requires a lot more code. You could write code which takes a block as a parameter and invokes it for each one it finds, but that's less natural. Here's what it looks like to use this generator:for(NSString*pathinMAGeneratorEnumerator(FileFinder(@"/Applications",@"app")))NSLog(@"%@",path);
Replacing State Machines
Code frequently requires state machines. Imagine something like a network protocol parser. The natural way to write this is to write code that flows from top to bottom. Read a character, act on it. Read the next character, act on it. Have loops for reading strings, read individual characters to skip over separators, etc.
Then asynchronous programming appears and suddenly this doesn't seem like a good idea. Every one of those read calls could block. You could put it in a separate thread, but that has its own problems. You look into doing callback-based networking, usingNSStream or GCD, but suddenly you have to deal with buffers, keep around a bunch of explicit state about where you are in the protocol parsing, etc.
By virtue of their inside-out approach to programming, generators allow you to invert the flow of control while preserving the nice linear code that you get from the synchronous approach.
As an example, consider a very simple RLE decoder, which reads pairs of{ count, value } bytes. Normally the decoder, if asynchronous, would have to keep track of what state it's in so that it knows whether the next byte it receives is a count or a value. However, using a generator, that state information disappears, replaced with the implicit generator state:
GENERATOR(int,RLEDecoder(void(^emit)(char)),(unsignedcharbyte)){__blockunsignedcharcount;GENERATOR_BEGIN(unsignedcharbyte){while(1){count=byte;GENERATOR_YIELD(0);while(count--)emit(byte);GENERATOR_YIELD(0);}}GENERATOR_END}
For a much more complicated example, here's an HTTP response parser written as a generator. It takes several blocks as parameters to use as callbacks for when it finishes parsing the various components of the response.
GENERATOR(int,HTTPParser(void(^responseCallback)(NSString*),void(^headerCallback)(NSDictionary*),void(^bodyCallback)(NSData*),void(^errorCallback)(NSString*)),(intbyte)){NSMutableData*responseData=[NSMutableDatadata];NSMutableDictionary*headers=[NSMutableDictionarydictionary];__blockNSMutableData*currentHeaderData=nil;__blockNSString*currentHeaderKey=nil;NSMutableData*bodyData=[NSMutableDatadata];GENERATOR_BEGIN(charbyte){// read response linewhile(byte!='\r'){AppendByte(responseData,byte);GENERATOR_YIELD(0);}responseCallback(SafeUTF8String(responseData));GENERATOR_YIELD(0);// eat the \rif(byte!='\n')errorCallback(@"bad CRLF after response line");GENERATOR_YIELD(0);// eat the \n// read headerswhile(1){currentHeaderData=[[NSMutableDataalloc]init];while(byte!=':'&&byte!='\r'){AppendByte(currentHeaderData,byte);GENERATOR_YIELD(0);}// empty line means we're done with headersif(byte=='\r'&&[currentHeaderDatalength]==0)break;elseif(byte=='\r')errorCallback(@"No colon found in header line");else{GENERATOR_YIELD(0);if(byte==' ')GENERATOR_YIELD(0);currentHeaderKey=[SafeUTF8String(currentHeaderData)copy];[currentHeaderDatarelease];currentHeaderData=[[NSMutableDataalloc]init];while(byte!='\r'){AppendByte(currentHeaderData,byte);GENERATOR_YIELD(0);}NSString*currentHeaderValue=SafeUTF8String(currentHeaderData);[currentHeaderDatarelease];[headerssetObject:currentHeaderValueforKey:currentHeaderKey];[currentHeaderKeyrelease];}GENERATOR_YIELD(0);if(byte!='\n')errorCallback(@"bad CRLF after header line");GENERATOR_YIELD(0);// eat the \n}headerCallback(headers);// read bodywhile(byte!=-1){AppendByte(bodyData,byte);GENERATOR_YIELD(0);}bodyCallback(bodyData);}GENERATOR_CLEANUP{[currentHeaderDatarelease];[currentHeaderKeyrelease];}GENERATOR_END}
int(^httpParser)(int)=HTTPParser(/* callbacks go here */);
httpParser(newByte);
httpParser(-1);
newByte provides enough information for the parser to have parsed a new component of the response, the appropriate callback block is invoked.This generator is declared to returnint since MAGenerator doesn't supportvoid generators. The return value is simply ignored in this case. Execution begins at the top, with the first character from the data stream stored inbyte. Each time it invokesGENERATOR_YIELD, the effect is to get a newbyte from the caller. Parsing thus flows naturally from top to bottom, even though this code is in fact heavily asynchronous.
It should be noted that these parsers are not very efficient, because they require a block call (a function call plus additional overhead) for every single byte. This could be easily remedied if necessary, however. Modify the generator to take a buffer instead of a single byte. Then, instead of blind invocation ofGENERATOR_YIELD, the code would advance a cursor in the buffer, and only invokeGENERATOR_YIELD when the buffer is empty and needs to be refilled. This would get you down to one call per buffer instead of one per byte.
Wrapping Up
Now you know how to build generators in Objective-C using blocks, and have at least some ideas of how to use them productively. Of course you shouldn't be building them manually, but rather you should use my MAGenerator library! MAGenerator is open source. You can get the source by checking it out from my subversion repository:
svn cohttp://www.mikeash.com/svn/MAGenerator/Or browse it by just clicking on the URL above. It's available under the MIT license, so you can use it in pretty much any project you like. The project includes everything needed to write generators, as well as a bunch of examples/test cases.
I'm sure there are improvements that could be made, and patches are welcome. Depending on the change you're making I can't guarantee that I'll accept it, but of course you're always welcome to fork it if I don't.
That's it for this week. It looks like this is my longest Friday Q&A to date, so I hope it makes up for just getting a code dump last week. Come back week for another frightening edition. As always (except this week) Friday Q&A is driven by your submissions. If you have a topic that you would like to see covered here,send it in!
GENERATOR(id, Zipper(NSArray *a, NSArray *b), (void))
{
NSEnumerator *aenum = [a objectEnumerator];
NSEnumerator *benum = [b objectEnumerator];
GENERATOR_BEGIN(void)
{
while(1)
{
GENERATOR_YIELD([a nextObject]);
GENERATOR_YIELD([b nextObject]);
}
}
GENERATOR_END
}
Add your thoughts, post a comment:
Spam and off-topic posts will be deleted without notice. Culprits may be publicly humiliated at my sole discretion.