In all the posting and hype about full-featured frameworks, you may have overlooked a very small "un-framework", the Python web server gateway interface (WSGI). It's generally an option for deploying the large frameworks such as TurboGears or Django. What follows is a very simple and brief overview of how you can create a WSGI-compliant application server.
First off, the WSGI specification itself is a decent read, and I'd be amiss if I didn't at least mention it. Now, on to the simple overview!
Overview
First off, you need to realize that WSGI is exactly what its name implies: an interface. The best way I've found to think of it is "CGI for Python." In CGI, the shell is invoked to run some script. The shell's environment is populated with values from the HTTP request, and the script's output is returned to the client. WSGI is similar, substituting a Python function for the script, a Python dict for the shell environment, and skipping the shell altogether. A basic WSGI application server has the following outline:
def MyApplication(environ, start_response):
try:
....maybe do some stuff in response to the environ arg...
write_fn = start_response('200 OK', [('Content-type', 'text/html')...]) # send headers
....maybe do some more stuff....
... EITHER ...
yield some things
.... OR ....
return
.... OR ....
write_fn(response_text) # deprecated
except:
start_response('500 OOPS', [('Content-type', 'text/html')...], sys.exc_info)
... yield, write, or return the text of the error page ...
Your application server, then, is just a function (or other callable) that takes two arguments, an "environment" and a "start_response". In the recommended implementation, your server will either return an iterable (generally a list of strings) or itself be an iterable (generally, a generator). The minimal "hello, world!" application is below:
def MyApplication(environ, start_response):
start_response('200 OK', [('Content-type', 'text/plain')])
yield "Hello, world!"
The "environment" is just a dict of strings, much like the CGI environment. The values available are summarized below. The "start_response" is a callable that your server must call to send the HTTP Headers. You can call it up to twice, once for "normal" headers, and once for "error" headers. If you call it a second time, you must call it before generating any output, and you must call it with an "exc_info" object. The original headers (if there were any) will be overwritten by the new headers.
To do anything useful, you'll need to parse two main variables in "environ": "PATH_INFO" and "QUERY_STRING". "PATH_INFO" gives you the "rest of the path" after the mount point for your application server, and "QUERY_STRING" gives you - you guessed it - the query string. You can then implement whatever kind of URL->object mapping your heart desires, whether it be CherryPy-style object publishing, or Django-style regular expressions. You could use the functions in Python's standard cgi module to parse the query string, but Ian Bicking has a great tutorial on how to use Paste to simplify matters quite a bit. All the other WSGI variables that are available in the environment are documented below.
Environment
The variables available in the environ dict are summarized below. For the examples, assume the user requested (using GET) "http://server.com/some/path/myserver/more/path?query_args", and that the application server was mounted at "http://server.com/some/path/myserver".
Variable | Example | Description | Always Present? |
---|---|---|---|
REQUEST_METHOD | "GET" | HTTP method, generally GET or POST | Yes |
SCRIPT_NAME | "/some/path/myserver" | Location in URL of application server | No - if application server is mounted at server root |
PATH_INFO | "/more/path" | The rest of the path after the application root | No - for instance, if user requests "http://server.com/some/path/myserver" |
QUERY_STRING | "query_args" | Anything after the "?" in the URL | No |
CONTENT_TYPE | <absent> | Any Content-Type fields in the HTTP request | No |
CONTENT_LENGTH | <absent> | Any Content-Length fields in the HTTP request | No |
SERVER_NAME | "server.com" | The server name part of the URL | Yes |
SERVER_PORT | "80" | The server port part of the URL | Yes |
SERVER_PROTOCOL | "HTTP/1.1" | The request HTTP protocol | Yes |
HTTP_* | <absent> | Other HTTP headers in request | No |
wsgi.version | The tuple (1,0) | WSGI version ID | Yes |
wsgi.url_scheme | "http" | The initial part of the URL | Yes |
wsgi.input | <empty file-like object> | An object from which the request body can be read - very useful for POSTs | Yes |
wsgi.errors | <file-like object> | A file-like object to which the application server can write text errors to be logged by the web server | Yes |
wsgi.multithread | False | Whether the application may be simultaneously invoked in a multithreaded manner | Yes |
wsgi.multiprocess | True | Whether the application may be simultaneously invoked in a multiprocess manner | Yes |
wsgi.run_once | False | Tune the application to expect to only run once (e.g. turn off caching) | Yes |
very informative!
ReplyDeletesuccint. and a perfect quickstart (and more) for those already with experience.
ReplyDeleteThanks, anonymous ones!
ReplyDelete