In a previous post, I gave an introduction to gevent to show some of the benefits your application might get from using gevent greenlets instead of threads. Some people, however, took issue with my benchmark code, saying that the threaded example was contrived. In this post, I'll try to answer some of the objections.
(It actually turns out that there was
a bug in the version of
ab I was using to test, as well, so I re-ran the tests
from the previous post, too.)
Threads versus Greenlets
Initially, I had proposed a dummy webserver that handled incoming requests by creating a thread and delegating communication to that thread. The code in question is below:
def threads(port): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() t = threading.Thread(target=handle_request, args=(cli, time.sleep)) t.daemon = True t.start()
When I could get the code above ot actually run the full benchmark (which it didn't often do) it ended up getting around 1300-1400 requests per second. The gevent version looked very similar:
import gevent def greenlet(port): from gevent import socket s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() gevent.spawn(handle_request, cli, gevent.sleep)
This code was able to handle closer to 1600 requests per second. Maybe I should have called it out better, but the fact that the gevent version performed better than the threaded version does point out an important aspect of gevent:
Greenlets are significantly lighter-weight than true threads, particularly when creating them.
However, the folks objected by pointing out that you just don't do that with threads. Nobody does. It's a dumb way to design a server. I agree with all these points, though that wasn't really the point I was going for. One thing I will point out is that:
The reason you don't design threaded servers so they fork a thread each time you get a connection is that threads are expensive to fork, unlike greenlets.
Fixing the benchmark
So anyway, to "fix" the benchmark so it's a little more fair to threads, we'll
use a thread pool to create all the threads up-front and then use a
to send work to them. Our server core now looks like this:
def threads(port, N=10): s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) q = Queue() for x in xrange(N): t = threading.Thread(target=thread_worker, args=(q,)) t.daemon = True t.start() print 'Ready and waiting with %d threads on port %d' % ( N, port) while True: cli, addr = s.accept() q.put(cli) def thread_worker(q): while True: sock = q.get() handle_request(sock, time.sleep)
If I now run this with a thread pool of 200 threads, I can indeed finish the
benchmark (ApacheBench as
ab -r -n 2000 -c 200... with around 1300 requests per
second (a little less, probably due to the synchronization overhead of the
Queue). So updating the benchmark to use a thread pool did not improve the
performance. The equivalent gevent code uses
def greenlet(port, N=10): from gevent.pool import Pool from gevent import socket, sleep pool = Pool(N) s = socket.socket() s.bind(('0.0.0.0', port)) s.listen(500) while True: cli, addr = s.accept() pool.spawn(handle_request, cli, sleep)
ab with the same parameters I now get... around 1200-1400 requests per second.
So why use gevent, again?
So yes, if I had designed the benchmark to omit the thread/greenlet creation entirely, threads and greenlets do indeed perform about the same. The big win for greenlets is when your thread pool isn't big enough to handle the concurrent connections.
It turns out that there's a clever denial-of-service attack on web servers known as slowloris that consumes threads from your thread pool quickly. Once your server's threads are all busy handling the slowloris requests, no further work can be done, and you end up with a very lightly loaded but still unresponsive server.
To illustrate this, we can try running our benchmark with the thread pool, but only running 20 threads in the pool, but modifying our request handler to take five seconds to handle a request. We'll go ahead and modify the benchmark line to allow more time for responses as well:
$ ab -n 2000 -c 200 -r -t 60 http://127.0.0.1:...
Now our threaded example ends up timing out connections as it tries to service 200 concurrent connections, each taking five seconds, with only 20 worker threads. If we go back to our naive (un-pooled) gevent example, however, we're able to achieve 47 requests per second, which is close to the theoretical maximum of 50 requests per second, with a very light server load.
The point? A slowloris attack will be able to eat up all the threads in your (finite-sized) thread pool, regardless of how big that pool is. Spawning a greenlet each time you receive a connection means you don't waste (almost) any resources waiting on IO.
There's a good bit more to gevent that I'd like to cover in future posts, but for now the points I'd like to leave you with are the following:
- You shouldn't be spawning something expensive like a thread for each incoming connection. It eats up various types of server resources.
- You shouldn't rely on thread pools to protect you from resource exhaustion, because they can fall victim to the slowloris attack.
- Gevent greenlets are lightweight enough that you can spawn one for each connection, and you don't have to rely on a pool (which can become exhausted in a slowloris type attack).
So what do you think? Have I convinced you? I'd love to hear your reaction in the comments below!