In a previous post, I described how to build a web chat server with socket.io and gevent. If you're trying to actually learn gevent, socket.io, however, it's probably not the best place to start. So I figured I'd write this post and provide an overview of gevent.
[Update 2012-07-24] In response to some criticisms of the micro-benchmarks in this post, I reworked the benchmarks and wrote an updated gevent and threads post. Make sure you read that one for more perspective on greenlets vs. threads.
What is Gevent, anyway?
According to the gevent webpage,
gevent is a coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.
That's a succinct definition, and it identifies all the technologies and implementation architecture of gevent, but it doesn't really give a good "beginner's view." The quickest way I can think of to define it is to say
Gevent gives you threads, but without using threads.
Why not just use threads?
So why not just use threads, then? The biggest drawback of threads for me is that they're relatively resource-intensive compared to greenlets, the thread-like abstraction used in gevent. For instance, here's a minimal program that simulates a "Hello World" webserver, but without any concurrency:
import sys import socket def sequential(port): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() handle_request(cli, time.sleep) def handle_request(s, sleep): try: s.recv(1024) sleep(0.1) s.send('''HTTP/1.0 200 Ok Hello world''') s.shutdown(socket.SHUT_WR) print '.', except Exception, ex: print 'e', ex, finally: sys.stdout.flush() s.close() if __name__ == '__main__': sequential(int(sys.argv))
The only thing 'special' about this script is that it does a few things to slow
handle_request method to make it (somewhat) more realistic. If we run
this under Apache's benchmarking tool with lots of concurrency, however, we get
abyssmal results. For instance, running with
ab -r -n 200 -c 200 http://localhost:... gives me around 11 requests per second
with lots of errors.
Maybe we can do better with threads? We can replace the
sequential function with
import threading def threads(port): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() t = threading.Thread(target=handle_request, args=(cli, time.sleep)) t.daemon = True t.start()
Running that with
ab -r -n 200 -c 200... gives me even worse results; the
benchmark simply refuses to finish, bailing out after 10 errors. Well, it turns
out we can use gevent to give us threadlike behavior without the overhead of
import gevent def greenlet(port): from gevent import socket s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() gevent.spawn(handle_request, cli, gevent.sleep)
Now with the same
ab parameters we get... 1487 requests per second, which
is about what the threading demo would get if we hadn't crushed it by sending 200
requests at it concurrently.
Why not always use gevent/greenlets?
So why not always use the greenlets in gevent? Mainly, it comes down to a question of
preemption. Greenlets use cooperative multitasking, where threads use preemptive
multitasking. What this means is that a greenlet will never stop executing and
"yield" to another greenlet unless it uses certain "yielding" functions (like
gevent.sleep). Threads, on the other hand, will
yield to other threads (sometimes unpredictably) based on when the operating
system decides to swap them out.
Of course, if you've been using Python for a while, you've heard something about a global interpreter lock (GIL) in Python that only allows a single thread to be executing Python bytecode at a time. So although you have threads in Python, and they give some concurrency (depending on whether the particular extension library you're using releases the GIL appropriately), threads provide less benefit than you might expect coming from C or Java.
So what else is in Gevent?
Hopefully I've given you some interest in learning more about gevent as well as some of the reasoning behind its existence. Some of the other goodies you'll find in gevent include
- Functions to monkey-patch the standard library so you can use
gevent.socket, for example
- Basic servers for handling socket-based connections with your own handlers
- More fine-grained control over the greenlets you spawn
- Synchronization primitives suitable for use with greenlets
- Greenlet pools
- Greenlet-local objects (like threadlocal, but with greenlets)
- Two greenlet-based WSGI servers
In future posts, I'll give more detail about how to use gevent productively. So what do you think? Is gevent something you've already got in your toolbox? Does its ability to handle concurrency interest you? Any projects already using gevent? I'd love to hear about it in the comments below!