Monday, July 30, 2012

Gevent and Greenlets

Continuing on in my series on gevent and Python, this article gets down into practical details, showing you how to install gevent and get started with basic greenlet operations. If you're just getting started with Gevent, you might want to read the previous articles in this series first:

And now that you're all caught up, let's get started with gevent...

Installing gevent

The first step to working with gevent is installing it. Luckily if you're familiar with pip, it's a fairly straightforward process. Note however that gevent and its dependencies include C extension modules, so you'll need to have a C compiler available for the install to work. I did my installation using a virtual environment:

$ virtualenv .venv
New python executable in .venv/bin/python
Installing setuptools............done.
Installing pip...............done.
$ source .venv/bin/activate
(.venv) $ pip install gevent
... lots of install messages ...

Now that you've got gevent installed, we'll move along to one of the most basic things you can do in gevent: creating greenlets.

Creating greenlets

Greenlets, sometimes referred to as "green threads," are a lightweight structure that allows you to do some cooperative multithreading in Python without the system overhead of real threads (like the thread or threading module would use). The main thing to keep in mind when dealing with greenlets is that a greenlet will never yield to another greenlet unless it calls some function in gevent that yields. (Real threads can be interrupted at somewhat arbitrary times by the operating system, causing a context switch.)

To create greenlets, you can use the gevent.spawn_* functions. The simplest is gevent.spawn:

>>> import gevent
>>> def simple_greenlet(*args, **kwargs):
...     print 'inside greenlet with', args, kwargs
...     
... 
>>> gevent.spawn(simple_greenlet, 1,2,3,foo=4)
<Greenlet at 0x10149acd0: simple_greenlet(1, 2, 3, foo=4)>
>>> gevent.sleep(1)
inside greenlet with (1, 2, 3) {'foo': 4}

Note in particular how the greenlet didn't do anything until we called sleep(). sleep() is one of the functions in gevent which will yield to other greenlets. If you want to yield to other greenlets but don't care to wait a second if there's no one ready to run, you can call gevent.sleep(0).

We can actually set up several greenlets to run concurrently and then sleep while they run:

>>> greenlets = [ gevent.spawn(simple_greenlet, x) for x in range(10) ]
>>> gevent.sleep(0)
inside greenlet with (0,) {}
inside greenlet with (1,) {}
...
inside greenlet with (8,) {}
inside greenlet with (9,) {}

If you're interested in waiting for the greenlets to complete, you can do so by using the gevent.joinall command. joinall can also take a timeout param that will stop waiting for the greenlets if they don't all complete after the given timeout. In the basic case, you just pass a list of Greenlet objects to joinall:

>>> greenlets = [ gevent.spawn(simple_greenlet, x) for x in range(10) ]
>>> gevent.joinall(greenlets)
inside greenlet with (0,) {}
inside greenlet with (1,) {}
...
inside greenlet with (8,) {}
inside greenlet with (9,) {}

By default, the "parent" greenlet that created a "child" greenlet won't receive any kind of feedback about the state of that child. If you'd like some feedback, there are other spawn_* functions you can use:

  • spawn_later(secs, function, *args, **kwargs) - This is the same as spawn except it waits the specified number of seconds before starting the child greenlet.
  • spawn_link(function, *args, **kwargs) - When the child greenlet dies (either due to an exception or due to successful completion), a gevent.greenlet.LinkedExited exception subclass will be raised in the parent, either a LinkedCompleted on successful completion, LinkedFailed on an unhandled exception in the child, or LinkedKilled if the child was killed by another greenlet.
  • spawn_link_exception(function, *args, **kwargs) - Just like linking the child, but the parent is only notified when the child dies due to an unhandled exception.
  • spawn_link_value(function, *args, **kwargs) - Just like linking the child, but the parent is only notified when the child completes successfully.

Normally, I use spawn_link_exception to make sure that the greenlet doesn't die unexpectedly without notifying its parent.

Greenlet objects

You probably noticed in the code above that the spawn_* functions return Greenlet objects. You can also construct the objects manually. The big difference between building a Greenlet this way versus with the spawn_* functions is that the Greenlet doesn't start executing automatically:

>>> gl = gevent.Greenlet(simple_greenlet, 'In a Greenlet')
>>> gevent.sleep(1)
>>> gl.start()
>>> gevent.sleep(1)
inside greenlet with ('In a Greenlet',) {}

There are several useful methods on these objects that you can use to interact with a running greenlet:

  • value - If a greenlet completes successfully and returns a value, it will be stored in this instance variable.
  • ready() - Returns True if the greenlet has finished execution (is the result "ready"?), either successfully or due to an error.
  • successful() - Only True if the greenlet completed successfully
  • start() - Start the greenlet
  • run() - Run the greenlet (like gl.start(); gevent.sleep(0))
  • start_later(secs) - Schedule the greenlet to start later
  • join(timeout=None) - Wait for a greenlet to complete or the timeout expires.
  • get(block=True, timeout=None) - Returns the return value of a greenlet, or if the greenlet raised an unhandled exception, reraises it here. If block==False and the greenlet is still running, raise gevent.Timeout. Otherwise wait until the greenlet exits or the timeout expires (in which case gevent.Timeout is called).
  • kill(exception=GreenletExit, block=False, timeout=None) - Raises an exception in the context of another greenlet. By default, this is GreenletExit. With block==True, this function will wait for the greenlet to die or for the timeout to expire.
  • link(receiver=None) (also link_exception, and link_value) - With the default value of None, these raise exceptions in the linking greenlet similar to the spawn_link_* functions. If you provide a Greenlet as the receiver, then the exception will be raised in that greenlet's context. If you provide a function as the receiver, it will be called with the linked greenlet as its sole parameter.

Limiting concurrency

Although greenlets are quite low-overhead, there may be some cases in which you wish to limit the number of greenlets created. Greenlet includes a Pool class that has the same spawn_* API that we saw earlier, as well as a few extra methods:

  • wait_available() waits till there is at least one idle greenlet in the pool.
  • full() returns True if all the greenlets in the pool are busy.
  • free_count() returns the number of greenlets available to do work.

Pool is actually a subclass of Group, which has various methods for adding and removing greenlets to the pool as well as some map_* and apply_* methods, as well, but I won't get into them here.

Conclusion

There's a lot more to gevent that I'll cover in future, particularly in the realm of building network servers using gevent, but hopefully this article gives you a feel for the basic concurrency abstractions underlying gevent. So what do you think? Is gevent already part of your Python toolkit? Interested in trying it out? I'd love to hear what you think in the comments below!

No comments:

Post a Comment