A few days ago Mark Baker left a little comment on WSGI on
his blog (for background Mark Baker is an enthusiastic REST advocate).
While he was mostly positive about WSGI, he said this in a comment:
This is the problem [with WSGI]:
def application(environ, start_response):
Because it encourages bad practice with HTTP, such as those apps
that behave the same way to GET and POST requests (as the examples
in the article do).
I think this would have been superior:
def get(environ, start_response):
def post(environ, start_response):
def put(environ, start_response):
Well, the first reason this doesn't work is fairly concrete. It would
actually have to look like this:
class Application:
def get(self, environ, start_response): ...
etc..
because you can't pass a request to a set of functions, without
somehow defining what the "set" is. You could also put them in a
module, or some other container object. This opens up a whole bunch
of messy and dull questions about how you traverse the
container to get to the request method method.
This also encodes request method dispatch into the spec, without
encoding any other kinds of dispatch into the spec. The bad solution
here would be to encode all kinds of dispatch into the spec. What
WSGI got really right was that it has absolutely no dispatch in
the specification.
At the same time implementing dispatch in WSGI is really easy.
Creating a dispatcher in a style like Mark suggests is rather
difficult, because you have to do something like this:
class Dispatcher:
def dispatch(self, environ, start_response):
return self.sub_object(environ)(environ, start_response)
get = post = put = dispatch
And of course you have to enumerate all the possible request
methods. This is just silly, but it's become a common design
misfeature in frameworks influenced by the Java Servlet specification
(e.g., BaseHTTPServer).
The reason I think this misdesign seems reasonable is because other kinds of
dispatch are assumed to have already occurred, and are opaque to the
specification. WSGI does not make this mistaken assumption.
The primary kind of dispatch that people do is based on the request
path. I.e., if you do GET /blog/archive/2006/10, first you have
to figure out what /blog/archive/2006/10 is. Here's the general
way you do this in WSGI:
- Probably you start out by realizing that /blog is the blog
application, and you delegate the request there. WSGI separates
the path into two parts using the CGI convention that the "used"
part of the path is in SCRIPT_NAME, and the "unused" part is in
PATH_INFO. So the blog app gets SCRIPT_NAME="/blog" and
PATH_INFO="/archive/2006/10"
- How does the blog application parse its part? We assume only the
blog application knows the best way to do that. Very possibly it
does another prefixed based search on /archive, and then that
resource in turn parses /2006/10. At different stages the
results of this intermediate parsing may go into the request in
non-standard locations, e.g., environ['routes.url_vars']
- Only after all these steps have happened is it likely that GET
has any meaning. Thankfully we've been allowed to totally ignore
the request method until this point.
Note that this specific example is not the only way it might work. For instance, we might have started with:
- First, look at the Host header and dispatch on that.
This adds virtual hosting. Server environments that code step
1 directly into their environment often have to create special
exceptions for virtual hosting. Or:
- First, look at the Host header, matching (.*)\.myblogs.org,
and putting the matched value into environ['myblogs.username']
Now you can set up a wildcard DNS and get your blog app to look up the
user based on that key.
If you want to pay close attention to the request method, you are
quite free to do so. WSGI does not tell you how, but it does not in
any way hinder you from doing so. A good example of something that
pays close attention to the request method is Joe Gregorio's
wsgicollection, which
could very well be the terminal point for some of these examples.
Because WSGI does not include any form of dispatching it represents
HTTP very accurately. HTTP does not lend special meaning to the
request path. It does not say that different domains resolve to
different servers. HTTP is a kind of message, and there are many ways
to interpret that message. Low-level specifications overstep their bounds when they interpret those messages for you.
WSGI is not an educational project, it is infrastructure, and an important feature is that it does not overstep its bounds.
I'll also note that I think WSGI represents the HTTP message very
well; the parts of HTTP that it leaves out are primarily about the
HTTP connection, which are best handled by the actual
connected-to-the-browser server. The rest of the message is all in
there if you want to use it.