Thursday, August 02, 2012

Greening the Python standard library with gevent

Continuing on in my series on gevent and Python, this article deals with what you need to do when want to use the Python standard library with gevent, showing how gevent provides a way to monkey-patch the standard library to make it compatible with gevent. If you're just getting started with Gevent, you might want to read the previous articles in this series first:

And now that you're all caught up, let's get started with gevent...

Blocking is bad

You may have read that introduction and wondered to yourself why you can't use the Python standard library as-is for your gevent programs. The answer lies in the way gevent accomplishes its cooperative multithreading. At the core of gevent is an event loop. The important thing to realize about the event loop is that it is where gevent decides which greenlet will run next.

The way the event loop actually gets control of your program in gevent is that you call one of the gevent functions that implicitly enter the loop. For instance, conceptually, gevent.sleep works like this:

def sleep(seconds):
    ev = schedule_timeout_event(seconds)
    schedule_greenlet(current_greenlet(), ready=ev)
    switch_to_event_loop()

So rather than actually blocking the current thread, as time.sleep would do, gevent.sleep is switching control to the event loop and registering an event that will fire to tell the event loop that this greenlet is ready to run.

The problem, then, with using a standard library function that doesn't know about the event loop is that it will simply block without returning control to the event loop. If the event loop doesn't run, no other greenlets can run. So if you call time.sleep in your gevent program, you'll simply freeze everything until the sleep returns.

Monkey-patching to the rescue

Gevent provides several "green" APIs that follow the pattern above, returning control to the event loop rather than blocking. Although you can technically build whatever you want out of these, there are some APIs in the standard library that aren't implemented in Gevent. For instance, using any of the following modules in the standard library can completely freeze a gevent-based program:

  • urllib or urllib2
  • httplib
  • ftplib
  • poplib
  • imaplib
  • nntplib
  • smtplib
  • telnetlib
  • SocketServer
  • BaseHTTPServer
  • CGIHTTPServer
  • xmlrpclib
  • SimpleXMLRPCServer

Additionally, of course, if you use the standard library versions of socket or ssl rather than those included in gevent, you can end up with a globally-blocked program. Obviously the Python standard library provides a wealth of functionality that it would be nice to have available in a gevent-based program without giving up the cooperative concurrency in gevent. For this purpose, gevent provides the gevent.monkey module.

What gevent.monkey does is replace the basic blocking operations in in the standard library with "greened" versions. For example, if you call gevent.monkey.patch_socket, gevent will replace various functions and classes in the standard library socket module with the gevent versions.

By patching the foundational modules like socket, ssl, and event thread, other modules that build on their functionality like urllib or xmlrpclib automatically become green. If you want to make sure you get them all, there's a nice function provided: gevent.monkey.patch_all(), which will patch the following modules in an attempt to "green" the whole standard library:

  • socket
  • ssl
  • os
  • time
  • select
  • thread and threading

In order to make sure that the monkey-patching works, you need to make sure you do it before any of the higher-level modules such as urllib are imported. The easiest way to do this is to start your top-level script with the following code:

import gevent.monkey; gevent.monkey.patch_all()

In most cases, this will "just work" for your program. T

Conclusion

In the Python world, monkey-patching generally has a bad name, as it's seen as an ugly hack to modify loaded code. And it is, in fact, an ugly hack to modify loaded code, but sometimes (as in the monkey-patching the standard library), it's the expedient thing to do.

So what do you think? Is the use of monkey-patching in gevent a reasonable compromise? If not, how would you build a system that requires some of the higher-level networking functionality in the standard library like xmlrpclib? I'd love to hear about it in the comments below!

No comments:

Post a Comment