gevent: Asynchronous I/O made easy

Daniel Pope

Comments

Source

This blogpost is a write-up of a talk I gave at Europython 2014. This is myinitial draft of the talk that was turned into slides for presentation.

The video is already on Youtube, so if you prefer you can watch:

gevent: Asynchronous I/O made easy (44 minutes)

You can see the slides used for the presentationhere.

Introduction

gevent is a framework for scalable asynchronous I/O with a fully synchronousprogramming model.

Let's look at some examples of where we are going. Here's an echo server:

fromgevent.serverimportStreamServerdefconnection_handler(socket,address):forlinsocket.makefile('r'):socket.sendall(l)if__name__=='__main__':server=StreamServer(('0.0.0.0',8000),connection_handler)server.serve_forever()

In this example we make 100 web requests in parallel:

fromgeventimportmonkeymonkey.patch_all()importurllib2fromgevent.poolimportPooldefdownload(url):returnurllib2.urlopen(url).read()if__name__=='__main__':urls=['http://httpbin.org/get']*100pool=Pool(20)printpool.map(download,urls)

What is the strangemonkey.patch_all() call? Not to worry, this isn't likeyour every day monkey patching. This is just a distribution of Python thathappens to be shipped as a set of monkey patches.

This final example is a chat server:

importgeventfromgevent.queueimportQueuefromgevent.serverimportStreamServerusers={}# mapping of username -> Queuedefbroadcast(msg):msg+='\n'forvinusers.values():v.put(msg)defreader(username,f):forlinf:msg='%s>%s'%(username,l.strip())broadcast(msg)defwriter(q,sock):whileTrue:msg=q.get()sock.sendall(msg)defread_name(f,sock):whileTrue:sock.sendall('Please enter your name: ')name=f.readline().strip()ifname:ifnameinusers:sock.sendall('That username is already taken.\n')else:returnnamedefhandle(sock,client_addr):f=sock.makefile()name=read_name(f,sock)broadcast('##%s joined from%s.'%(name,client_addr[0]))q=Queue()users[name]=qtry:r=gevent.spawn(reader,name,f)w=gevent.spawn(writer,q,sock)gevent.joinall([r,w])finally:del(users[name])broadcast('##%s left the chat.'%name)if__name__=='__main__':importsystry:myip=sys.argv[1]exceptIndexError:myip='0.0.0.0'print'To join, telnet%s 8001'%myips=StreamServer((myip,8001),handle)s.serve_forever()

Simple enough? Let's look at why this gevent stuff is written like this.

Synchronous I/O

Synchronous I/O means that each I/O operation is allowed to block until it iscomplete.

To scale this to more than one user at a time, we need threads and processes.Each thread or process is allowed to individually block waiting on I/O. Becausewe have full concurrency, this blocking doesn't affect other operations; fromthe point of view of each thread/process the world stops until the operation iscomplete. The OS will resume the thread/process when the results are ready.

/presentations/gevent-talk/_static/blocking-io.svg

Drawbacks of threads: terrible performance. SeeDave Beazley's notes on theGIL. Also high memory use. Threads in Linux allocate stack memory (see ulimit-s). This is not useful for Python - it will just cause you to run out ofmemory with relatively few threads.

Drawbacks of processes: No shared memory space. High memory use, both becauseof the stack allocation and copy-on-write. Threads in Linux are like a specialkind of process; the kernel structures are more or less the same.

Asyncronous I/O

All asynchronous I/O falls back to the same pattern. It's not really about howthe code executes but where the waiting is done. Multiple I/O activities needto unify their efforts to wait so that the waiting happens in a single place inthe code. When an event occurs, the asynchronous system needs to resume thesection of the code that was waiting for that event.

The problem then, is not how to do the waiting in a single place, but how toresume the one piece of code expecting to receive that event.

There are several different approaches to how to organise a single-threadedprogram so that all of the waiting can be done in a single place in code, ie.there are several different ideas about whatresume() andwaiter mightbe in the followingevent loop code:

read_waiters={}write_waiters={}timeout_waiters=[]defwait_for_read(fd,waiter):read_waiters[fd]=waiterwait_for_write=write_waiters.___setitem__defevent_loop():whileTrue:readfds=read_waiters.keys()writefds=write_waiters.keys()read,write,error=select.select(readfds,# waiting for readwritefds,# waiting for writereadfds+writefds,# waiting for errors)forfdinread:resume(read_waiters.pop(fd))forfdinwrite:resume(write_waiters.pop(fd))# something about errors

We may want to add timeouts to the above code, in which case we might writesomething also include something like the following:

timeout_waiters=[]defwait_for_timeout(delay,waiter):when=time.time()+delayheapq.heappush(timeout_waiters,(when,waiter))defevent_loop():whileTrue:now=time.time()read,write,error=select.select(rfds,wfds,efds,timeout_waiters[0][0]-time.time())whiletimeout_waiters:iftimeout_waiters[0][0]<=now:_,waiter=heapq.heappop(timeout_waiters)resume(waiter)

All asynchronous I/O frameworks are built on the same kind of model; they justtake different approaches to how to structure your code so that it can besuspended when an I/O operation is requested and resumed when that operation iscompleted.

Callbacks

Examples:

  • Javascript/Node
  • Tornado IOStream
  • Twisted Deferred
  • asyncio, under the hood

One approach is to just call a callable whenever data is available to read.Typically we want to act on a higher level than chunks of data, so have acallback read and parse individual chunks of binary data and call anapplication callback when the parsed content is complete (such as a HTTPrequest or response).

What the user code looks like:

defstart_beer_request():http.get('/api/beer',handle_response)defhandle_response(resp):beer=load_beer(resp.json)do_something(beer)

How can we tie the response back to a specific request? One answer is closures:

defget_fruit(beer_id,callback):defhandle_response(resp):beer=load_beer(resp.json)callback(beer)http.get('/api/beer/%d'%beer_id,handle_response)

Either way is ugly, especially if we wanted to chain more I/O calls (anyonefancy this nesting any deeper?). There's no escape from the nesting andprogramming in pieces. As it has been said, "Callbacks are the new Goto".

The call stack looks like this:

/presentations/gevent-talk/_static/callbacks.svg

Note how the return values never go anywhere useful; indeed the only way topass results around is to write additional callbacks.

Method-based callbacks

Examples:

  • Twisted Protocols
  • Tornado RequestHandler
  • asyncio Transports/Protocols

Callbacks are a mess! So we might organise them into interfaces where themethods are automatically registered as callbacks, so that users can subclassthe interfaces to implement them. For example, this is asyncio example code:

importasyncioclassEchoClient(asyncio.Protocol):message='This is the message. It will be echoed.'defconnection_made(self,transport):transport.write(self.message.encode())print('data sent: {}'.format(self.message))defdata_received(self,data):print('data received: {}'.format(data.decode()))defconnection_lost(self,exc):print('server closed the connection')asyncio.get_event_loop().stop()

This solution doesn't completely solve the problem of eliminating callbacks, itjust gives you a neater framework that avoids callbacks being dotted all overthe code. It places constraints on what callbacks can be called and where theycan be defined. Suppose you want to link two protocols together - for example,you need to make an HTTP request to a backend REST service in the middle ofhandling a request from a user. You still get the problem of logic beingfragmented over multiple callbacks, and you still can't use the return valuesfrom any asynchronous function.

Error handling in callbacks

When using a callback-based programming model, you have to register additionalerror handler callbacks.

Sadly not all frameworks enforce this. One reason is that it makes every singleprogram twice as hard to read if you do enforce it. So it's optional, andconsequently programmers don't always (even often) do it.

This violatesPEP20:

Errors should never pass silently.
Unless explicitly silenced.

One of the bigger risks of not properly handling errors is that your internalstate may get out of sync, deadlock waiting for an event that will neverarrive, or holding resources indefinitely for a connection that has alreadydisconnected.

Generator-based Coroutine

Examples:

  • tornado.gen
  • asyncio/Tulip

Generators have been used to implement coroutine-like functionality. Thisallows us to defer to some event loop system which will resume theafter-the-callback section of our code after the I/O operation has completed:

importasyncio@asyncio.coroutinedefcompute(x,y):print("Compute%s +%s ..."%(x,y))yield fromasyncio.sleep(1.0)returnx+y@asyncio.coroutinedefprint_sum(x,y):result=yield fromcompute(x,y)print("%s +%s =%s"%(x,y,result))loop=asyncio.get_event_loop()loop.run_until_complete(print_sum(1,2))loop.close()

(In this example we should note that we will want to spawn other asyncactivities to get any scalability benefit out of this approach).

When generators were introduced with PEP255 they were described as providingcoroutine-like functionality. PEP342 and PEP380 extended this with the abilityto send exceptions to a generator and toyield from a subgeneratorrespectively.

The term "coroutine" denotes a system where you have more than one routine -effectively, call stack - active at a time. Because each call stack ispreserved, a routine can suspend its state and yield to a different, namedcoroutine. In some languages, there is ayield keyword of some form thatinvokes this behaviour (so different to theyield keyword in Python).

Why would you want to do this? This provides a primitive form of multitasking-- cooperative multitasking. Unlike threads, they are notpreemptive, whichmeans that they aren't interrupted (preempted) until they explicitly yield.

Generators are a subset of this behaviour, right down to the 'yield'terminology. Wikipedia calls themsemicoroutines. However there are twosignificant differences between generators and coroutines:

  1. Generators can only yield to the calling frame.
  2. Every frame in the stack needs to collaborate in yielding to the callingframe - so the top frame mightyield, and all other calls in the stackneed to be made withyield from.

The call stack will look as below:

/presentations/gevent-talk/_static/generators.svg

Note that this use ofyield means that you can't also useyield towrite asynchronous generators. You have to return lists instead.

Greenlets/green threads

A greenlet is a full coroutine.

Examples:

  • gevent
  • greenlet
  • Stackless Python

Let's rewrite thatasyncio example with gevent:

importgeventdefcompute(x,y):print"Compute%s +%s ..."%(x,y)gevent.sleep(1.0)returnx+ydefprint_sum(x,y):result=compute(x,y)print"%s +%s =%s"%(x,y,result)print_sum(1,2)

(Again, we'll want to start other greenlets to get any scalability benefit outof this approach).

That's much simpler! We can simply omit all the generator cruft becausegevent.sleep() can yield to the event loop (called thehub in gevent)without the involvement of the calling frames. Also, because we have firstclass co-routines, the hub can be instantiated on demand; it doesn't need to becreated as the explicit parent of the stack.

Coroutines give us a magic jump right from our current stack to the eventloop - and back when ready:

/presentations/gevent-talk/_static/coroutines.svg

The single piece of C-level magic allows us to write code that appearssynchronous, but actually has all the benefits of asynchronous I/O.

Gevent: greenlets plus monkey-patching

Suppose though the sleep wasn't in our code? We can use gevent'smonkey-patching to ensure that no code changes are necessary:

# These two lines have to be the first thing in your program, before# anything else is importedfromgevent.monkeyimportpatch_allpatch_all()importtimedefcompute(x,y):print"Compute%s +%s ..."%(x,y)time.sleep(1.0)returnx+ydefprint_sum(x,y):result=compute(x,y)print"%s +%s =%s"%(x,y,result)print_sum(1,2)

The bulk of blocking calls in the standard library are patched so that theyyield to the hub rather than blocking. Likewise the threading system is patchedso that greenlets are spawned instead of threads.

Isn't monkey-patching bad?

In this case it's better to think of gevent as a distribution of Python thathappens to use green threads, and includes different implementations ofstandard library code.

This is part of the reason it has to come right at the top of the module thatcontains the entry point of the program: it's a statement that before anythingelse happens, we want to use the gevent's distribution of the stdlib.

While the monkey patching is elegant in that it allows pure Python applicationsand libraries to become asynchronous with no modifications, it is an optionalcomponent of gevent, and you can write asynchronous programs without it.

  • Works with any existing synchronous, pure Python code.
  • Generally works with async code too - onlyselect() is emulated, but mostasync frameworks have aselect()-based implementation. No reasonepoll() etc couldn't be implemented.

Examples of gevent threading primitives

Because gevent is based on lightweight "threads" the gevent library contains awealth of concurrency tools that help you spawn greenlets, implement criticalsections (locks/mutexes), and pass messages between greenlets.

Because it's a networking library it also has some higher-level network servermodules, such as a TCP server and WSGI server.

Spawning and killing greenlets

  • gevent.spawn(function, *args, **kwargs) - spawn a greenlet
  • gevent.kill(greenlet, exception=GreenletExit) - 'kill' a greenlet byraising an exception in that greenlet.

There are also higher-level primitives likegevent.pool, a greenlet-basedequivalent tomultiprocessing.pool.

Synchronisation primitives

  • gevent.lock.Sempahore
  • gevent.lock.RLock
  • gevent.event.Event

Message Passing

  • gevent.queue.Queue
  • gevent.event.AsyncResult - block waiting for a single result. Alsoallows an exception to be raised instead.

Timeouts

There's a useful wrapper to kill a greenlet if it doesn't succeed within acertain period of time. This can be used to wrap timeouts around pretty muchany sequence of operations:

fromgeventimportTimeoutwithTimeout(5):data=sock.recv()

Higher level server kit

  • gevent.server.StreamServer - TCP server. Also supports SSL.
  • gevent.server.DatagramServer - UDP server.
  • gevent.pywsgi.WSGIServer - WSGI server that supports streaming,keepalives, SSL etc.

Gevent I/O patterns

The recommended way of using gevent (and avoidingselect()) is to spawn onegreenlet for each direction of communication - read or write. The code for eachgreenlet is a simple loop that "blocks" whenever necessary. See thechatserver code earlier for an example.

Why do we want a synchronous programming model?

First of all, it makes the code easier to read and understand. It's the samekind of programming you do when single-threaded. The code isn't dotted withyield from statements.

More significantly, of the methods described above, only gevent requires nochanges to the calling conventions of other code. The importance of this shouldnot be understated: it means that your business logic can happily call intoblocking code.

As a very simple example, say we want to do a streaming API. Our pre-existingbusiness logic (process_orders()) expects an iterable. With gevent, we canstream this iterable asynchronously from a remote server with no code changes!

fromgevent.socketimportcreate_connectiondefprocess_orders(lines):"""Do something important with an iterable of lines."""forlinlines:...# Create an asynchronous file-like object with geventsocket=create_connection((host,port))f=socket.makefile(mode='r')process_orders(f)

Another advantage is that exceptions can always be raised, in a sensible place.

Unlike real threads, greenlets are never suspended at arbitrary times, so youcan get away with many fewer locks and mutexes - you only need a mutex ifthere's a risk you might "block" between atomic operations.

Also unlike real threads, a greenlet can "kill" another greenlet - causing anexception to be raised in that greenlet when it next resumes.

Disdvantages

Bad news: the Python 3 branch of gevent isn't finished. I have not investigatedwhether it is at all usable.

One solution is to allow different implementation languages in different partsof your stack. gevent is great at dealing with scalable I/O, Python 3 is goodat Unicode and user-facing applications. It may be possible to arrange to usethe right tool for each job.

Pitfalls of asynchronous I/O

Gevent shares several pitfalls with many asynchronous I/O frameworks:

  • Blocking (real blocking, at the kernel level) somewhere in your program haltseverything. This is most likely in C code where monkey patches don't takeeffect. You need to take careful steps to make your C library "green".
  • Keeping the CPU busy. greenlets are not preempted, so this will causeother greenlets never to be scheduled.
  • There exists the possibility of deadlock between greenlets.

In summary gevent has very few pitfalls that are not present in other async I/Oframeworks (sure, you probably can't deadlock callbacks but only because theevent loop probably doesn't provide the synchronisation primitives to do so, soyou also can't implement critical sections).

One pitfall gevent sidesteps is that you are less likely to hitan async-unaware Python library that will block your app, because pure Pythonlibraries will use the monkey patched stdlib.

n to m concurrency

An approach for even better scalability is to run n greenlets on m physicalthreads. In Python we need to do this with processes. This gives very goodperformance on multiprocessor systems, as well as adding resilience.

This is the model that is used by Rust and Java among others.

Experiences with gevent

I evaluated gevent as well as the other systems mentioned in 2011. Gevent wasthe clear winner. There was little to choose from in terms of performance, butgevent's significantly simpler programming model was a major selling point.Not all developers are at the level where they are comfortable withgenerators, closures and callbacks, and gevent doesn't require them: you canuse whatever techniques make your code most legible.

The ability to use gevent with existing code, or business logic that isagnostic about IO, was very valuable too: you probably want to re-use certainbusiness logic libraries in both high-performance network apps and offlinebatch processes.

Over the following 18 months we wrote a variety of network applications.One byproduct was a web service framework callednucleon, which aimed toconnect RESTful JSON, PostgreSQL and AMQP, all with 'green' driver code forhigh scalability.

Though we had to make code changes to use PostgreSQL without blocking, gevent'smonkey patching meant that the pure-Python drivers for other data stores justworked - so Redis, ElasticSearch and CouchDB could all be used transparently.

The AMQP library was originally a fork of Puka (not Pika), an AMQP library thatdidn't try to force its own brand of async on you (like Pika). I eventuallycompletely rewrote this and split it into a separate project callednucleon.amqp.nucleon.amqp allows interaction with an AMQP server with afully synchronous programming model - remote queues on the AMQP broker can beexposed locally with the Queue API.

It wasn't without head-scratching moments, as with any concurrent programmingproject. However as a team we adapted to gevent and found ourselves developinga language of diagrams to explain to each other and understand how the flow ofcontrol moves between greenlets, how they block and how they signal each other.

The project was successful - we were able to keep our code simple andmaintainable while ensuring our services were fast and scalable (a claimbacked up by load testing).

Comments

Comments powered byDisqus