Python Buzz Forum - WSGI and Dispatch

A few days ago Mark Baker left a little comment on WSGI on his blog (for background Mark Baker is an enthusiastic REST advocate).

While he was mostly positive about WSGI, he said this in a comment:

This is the problem [with WSGI]:
def application(environ, start_response):
Because it encourages bad practice with HTTP, such as those apps that behave the same way to GET and POST requests (as the examples in the article do).

I think this would have been superior:
def get(environ, start_response):
def post(environ, start_response):
def put(environ, start_response):

Well, the first reason this doesn't work is fairly concrete. It would actually have to look like this:

class Application:
    def get(self, environ, start_response): ...
    etc..

because you can't pass a request to a set of functions, without somehow defining what the "set" is. You could also put them in a module, or some other container object. This opens up a whole bunch of messy and dull questions about how you traverse the container to get to the request method method.

This also encodes request method dispatch into the spec, without encoding any other kinds of dispatch into the spec. The bad solution here would be to encode all kinds of dispatch into the spec. What WSGI got really right was that it has absolutely no dispatch in the specification.

At the same time implementing dispatch in WSGI is really easy. Creating a dispatcher in a style like Mark suggests is rather difficult, because you have to do something like this:

class Dispatcher:
    def dispatch(self, environ, start_response):
        return self.sub_object(environ)(environ, start_response)
    get = post = put = dispatch

And of course you have to enumerate all the possible request methods. This is just silly, but it's become a common design misfeature in frameworks influenced by the Java Servlet specification (e.g., BaseHTTPServer).

The reason I think this misdesign seems reasonable is because other kinds of dispatch are assumed to have already occurred, and are opaque to the specification. WSGI does not make this mistaken assumption.

The primary kind of dispatch that people do is based on the request path. I.e., if you do GET /blog/archive/2006/10, first you have to figure out what /blog/archive/2006/10 is. Here's the general way you do this in WSGI:

Probably you start out by realizing that /blog is the blog application, and you delegate the request there. WSGI separates the path into two parts using the CGI convention that the "used" part of the path is in SCRIPT_NAME, and the "unused" part is in PATH_INFO. So the blog app gets SCRIPT_NAME="/blog" and PATH_INFO="/archive/2006/10"
How does the blog application parse its part? We assume only the blog application knows the best way to do that. Very possibly it does another prefixed based search on /archive, and then that resource in turn parses /2006/10. At different stages the results of this intermediate parsing may go into the request in non-standard locations, e.g., environ['routes.url_vars']
Only after all these steps have happened is it likely that GET has any meaning. Thankfully we've been allowed to totally ignore the request method until this point.

Note that this specific example is not the only way it might work. For instance, we might have started with:

First, look at the Host header and dispatch on that.

This adds virtual hosting. Server environments that code step 1 directly into their environment often have to create special exceptions for virtual hosting. Or:

First, look at the Host header, matching (.*)\.myblogs.org, and putting the matched value into environ['myblogs.username']

Now you can set up a wildcard DNS and get your blog app to look up the user based on that key.

If you want to pay close attention to the request method, you are quite free to do so. WSGI does not tell you how, but it does not in any way hinder you from doing so. A good example of something that pays close attention to the request method is Joe Gregorio's wsgicollection, which could very well be the terminal point for some of these examples.

Because WSGI does not include any form of dispatching it represents HTTP very accurately. HTTP does not lend special meaning to the request path. It does not say that different domains resolve to different servers. HTTP is a kind of message, and there are many ways to interpret that message. Low-level specifications overstep their bounds when they interpret those messages for you. WSGI is not an educational project, it is infrastructure, and an important feature is that it does not overstep its bounds.

I'll also note that I think WSGI represents the HTTP message very well; the parts of HTTP that it leaves out are primarily about the HTTP connection, which are best handled by the actual connected-to-the-browser server. The rest of the message is all in there if you want to use it.


	Web Artima.com