Friday, February 17, 2006

Very Simple WSGI Overview

Categories: , , ,
In all the posting and hype about full-featured frameworks, you may have overlooked a very small "un-framework", the Python web server gateway interface (WSGI). It's generally an option for deploying the large frameworks such as TurboGears or Django. What follows is a very simple and brief overview of how you can create a WSGI-compliant application server.

First off, the WSGI specification itself is a decent read, and I'd be amiss if I didn't at least mention it. Now, on to the simple overview!

Overview


First off, you need to realize that WSGI is exactly what its name implies: an interface. The best way I've found to think of it is "CGI for Python." In CGI, the shell is invoked to run some script. The shell's environment is populated with values from the HTTP request, and the script's output is returned to the client. WSGI is similar, substituting a Python function for the script, a Python dict for the shell environment, and skipping the shell altogether. A basic WSGI application server has the following outline:

def MyApplication(environ, start_response):
try:
....maybe do some stuff in response to the environ arg...
write_fn = start_response('200 OK', [('Content-type', 'text/html')...]) # send headers
....maybe do some more stuff....
... EITHER ...
yield some things
.... OR ....
return
.... OR ....
write_fn(response_text) # deprecated
except:
start_response('500 OOPS', [('Content-type', 'text/html')...], sys.exc_info)
... yield, write, or return the text of the error page ...

Your application server, then, is just a function (or other callable) that takes two arguments, an "environment" and a "start_response". In the recommended implementation, your server will either return an iterable (generally a list of strings) or itself be an iterable (generally, a generator). The minimal "hello, world!" application is below:

def MyApplication(environ, start_response):
start_response('200 OK', [('Content-type', 'text/plain')])
yield "Hello, world!"

The "environment" is just a dict of strings, much like the CGI environment. The values available are summarized below. The "start_response" is a callable that your server must call to send the HTTP Headers. You can call it up to twice, once for "normal" headers, and once for "error" headers. If you call it a second time, you must call it before generating any output, and you must call it with an "exc_info" object. The original headers (if there were any) will be overwritten by the new headers.

To do anything useful, you'll need to parse two main variables in "environ": "PATH_INFO" and "QUERY_STRING". "PATH_INFO" gives you the "rest of the path" after the mount point for your application server, and "QUERY_STRING" gives you - you guessed it - the query string. You can then implement whatever kind of URL->object mapping your heart desires, whether it be CherryPy-style object publishing, or Django-style regular expressions. You could use the functions in Python's standard cgi module to parse the query string, but Ian Bicking has a great tutorial on how to use Paste to simplify matters quite a bit. All the other WSGI variables that are available in the environment are documented below.

Environment


The variables available in the environ dict are summarized below. For the examples, assume the user requested (using GET) "http://server.com/some/path/myserver/more/path?query_args", and that the application server was mounted at "http://server.com/some/path/myserver".



















VariableExampleDescriptionAlways Present?
REQUEST_METHOD"GET"HTTP method, generally GET or POSTYes
SCRIPT_NAME"/some/path/myserver"Location in URL of application serverNo - if application server is mounted at server root
PATH_INFO"/more/path"The rest of the path after the application rootNo - for instance, if user requests "http://server.com/some/path/myserver"
QUERY_STRING"query_args"Anything after the "?" in the URLNo
CONTENT_TYPE<absent>Any Content-Type fields in the HTTP requestNo
CONTENT_LENGTH<absent>Any Content-Length fields in the HTTP requestNo
SERVER_NAME"server.com"The server name part of the URLYes
SERVER_PORT"80"The server port part of the URLYes
SERVER_PROTOCOL"HTTP/1.1"The request HTTP protocolYes
HTTP_*<absent>Other HTTP headers in requestNo
wsgi.versionThe tuple (1,0)WSGI version IDYes
wsgi.url_scheme"http"The initial part of the URLYes
wsgi.input<empty file-like object>An object from which the request body can be read - very useful for POSTsYes
wsgi.errors<file-like object>A file-like object to which the application server can write text errors to be logged by the web serverYes
wsgi.multithreadFalseWhether the application may be simultaneously invoked in a multithreaded mannerYes
wsgi.multiprocessTrueWhether the application may be simultaneously invoked in a multiprocess mannerYes
wsgi.run_onceFalseTune the application to expect to only run once (e.g. turn off caching)Yes


3 comments:

  1. Anonymous12:50 PM

    very informative!

    ReplyDelete
  2. Anonymous3:42 PM

    succint. and a perfect quickstart (and more) for those already with experience.

    ReplyDelete
  3. Thanks, anonymous ones!

    ReplyDelete