Just a little Python: Introduction to Gevent

Monday, July 23, 2012

Introduction to Gevent

In a previous post, I described how to build a web chat server with socket.io and gevent. If you're trying to actually learn gevent, socket.io, however, it's probably not the best place to start. So I figured I'd write this post and provide an overview of gevent.

[Update 2012-07-24] In response to some criticisms of the micro-benchmarks in this post, I reworked the benchmarks and wrote an updated gevent and threads post. Make sure you read that one for more perspective on greenlets vs. threads.

What is Gevent, anyway?

According to the gevent webpage,

gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.

That's a succinct definition, and it identifies all the technologies and implementation architecture of gevent, but it doesn't really give a good "beginner's view." The quickest way I can think of to define it is to say

Gevent gives you threads, but without using threads.

Why not just use threads?

So why not just use threads, then? The biggest drawback of threads for me is that they're relatively resource-intensive compared to greenlets, the thread-like abstraction used in gevent. For instance, here's a minimal program that simulates a "Hello World" webserver, but without any concurrency:

import sys
import socket

def sequential(port):
    s = socket.socket()
    s.bind(('0.0.0.0', port))
    s.listen(500)
    while True:
        cli, addr = s.accept()
        handle_request(cli, time.sleep)

def handle_request(s, sleep):
    try:
        s.recv(1024)
        sleep(0.1)
        s.send('''HTTP/1.0 200 Ok

    Hello world''')
        s.shutdown(socket.SHUT_WR)
        print '.',
    except Exception, ex:
        print 'e', ex,
    finally:
        sys.stdout.flush()
        s.close()

if __name__ == '__main__':
    sequential(int(sys.argv[1]))

The only thing 'special' about this script is that it does a few things to slow down the handle_request method to make it (somewhat) more realistic. If we run this under Apache's benchmarking tool with lots of concurrency, however, we get abyssmal results. For instance, running with ab -r -n 200 -c 200 http://localhost:... gives me around 11 requests per second with lots of errors.

Maybe we can do better with threads? We can replace the sequential function with a threads function:

import threading

def threads(port):
    s = socket.socket()
    s.bind(('0.0.0.0', port))
    s.listen(500)
    while True:
        cli, addr = s.accept()
        t = threading.Thread(target=handle_request, args=(cli, time.sleep))
        t.daemon = True
        t.start()

Running that with ab -r -n 200 -c 200... gives me even worse results; the benchmark simply refuses to finish, bailing out after 10 errors. Well, it turns out we can use gevent to give us threadlike behavior without the overhead of threads:

import gevent

def greenlet(port):
    from gevent import socket
    s = socket.socket()
    s.bind(('0.0.0.0', port))
    s.listen(500)
    while True:
        cli, addr = s.accept()
        gevent.spawn(handle_request, cli, gevent.sleep)

Now with the same ab parameters we get... 1487 requests per second, which is about what the threading demo would get if we hadn't crushed it by sending 200 requests at it concurrently.

Why not always use gevent/greenlets?

So why not always use the greenlets in gevent? Mainly, it comes down to a question of preemption. Greenlets use cooperative multitasking, where threads use preemptive multitasking. What this means is that a greenlet will never stop executing and "yield" to another greenlet unless it uses certain "yielding" functions (like gevent.socket.socket.recv or gevent.sleep). Threads, on the other hand, will yield to other threads (sometimes unpredictably) based on when the operating system decides to swap them out.

Of course, if you've been using Python for a while, you've heard something about a global interpreter lock (GIL) in Python that only allows a single thread to be executing Python bytecode at a time. So although you have threads in Python, and they give some concurrency (depending on whether the particular extension library you're using releases the GIL appropriately), threads provide less benefit than you might expect coming from C or Java.

So what else is in Gevent?

Hopefully I've given you some interest in learning more about gevent as well as some of the reasoning behind its existence. Some of the other goodies you'll find in gevent include

Functions to monkey-patch the standard library so you can use socket.socket rather than gevent.socket, for example
Basic servers for handling socket-based connections with your own handlers
More fine-grained control over the greenlets you spawn
Synchronization primitives suitable for use with greenlets
Greenlet pools
Greenlet-local objects (like threadlocal, but with greenlets)
Two greenlet-based WSGI servers

In future posts, I'll give more detail about how to use gevent productively. So what do you think? Is gevent something you've already got in your toolbox? Does its ability to handle concurrency interest you? Any projects already using gevent? I'd love to hear about it in the comments below!

8 comments:

Anonymous1:29 PM
The thread-based example makes very little sense, and greatly detracts from the point, a thread-based solution would never spawn a thread for each request (for any kind of external-facing system, werkzeug's testing server does that but it's just for testing in order to force concurrent access). You're essentially trying to spawn 200 threads at the same time.

The test, as designed, has nothing to do with performances or even with threads. It's terrible code failing to a DOS attack.
ReplyDelete
Replies
Anonymous9:31 AM
I wanted to try gevent for quite some time, thx for the intro.
Even consider the things said about threads I wanted to see how it works for me. Using your code (only arranged things different because I want to try more things later) I got not that horrible results for threads: not as good as greenlets but way better that calling the function directly (which kinda is strange, or not?)... am I missing something?
Code: https://gist.github.com/3169888
Results: https://gist.github.com/3169926
ReplyDelete
Replies

Add comment

Useful Resources

Interested in practical MongoDB programming?

MongoDB Applied Design Patterns
is available now, both in ebook and dead-tree form. In it, you'll see how to use MongoDB effectively in fields from real-time analytics to content management systems and more. The examples are all in Python, so readers of this blog should have no problem picking it all up.

Want to learn MongoDB using Python?

I just released an 84-page ebook MongoDB with Python and Ming to help you get started. In it, I cover everything from installing MongoDB for the first time, basic pymongo usage, MongoDB aggregation including MapReduce and the new aggregation framework, and GridFS. You'll also learn about Ming, the object-document mapper we built at SourceForge to accelerate our development beyond what we could do with PyMongo.

Want more personalized training?

I'm available for customized onsite Python and MongoDB training classes. You can sign up here for more information on this and other classes I'll be offering in the future including online and public training.

Just a little Python

Monday, July 23, 2012

Introduction to Gevent

What is Gevent, anyway?

Why not just use threads?

Why not always use gevent/greenlets?

So what else is in Gevent?

8 comments:

Search

Useful Resources

Interested in practical MongoDB programming?

Want to learn MongoDB using Python?

Want more personalized training?

Pages

Rick's Resources

FeedBurner FeedCount

Email

Labels

Links

Blog Archive

Email

Popular Posts