Python Buzz Forum - How to do a Conditional HTTP GET with Python urllib2

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Python Buzz Forum
How to do a Conditional HTTP GET with Python urllib2

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Jarno Virtanen

Posts: 109
Nickname: jajvirta
Registered: May, 2003

Jarno Virtanen is a university student for life, it seems, and a part time software developer

How to do a Conditional HTTP GET with Python urllib2

Posted: Sep 28, 2003 12:15 PM

This post originated from an RSS feed registered with Python Buzz by Jarno Virtanen.
Original Post: How to do a Conditional HTTP GET with Python urllib2 Feed Title: Python owns us Feed URL: http://sedoparking.com/search/registrar.php?domain=®istrar=sedopark Feed Description: A weblog about Python from the view point of Jarno Virtanen.	Latest Python Buzz Posts Latest Python Buzz Posts by Jarno Virtanen Latest Posts From Python owns us

It's not rocket science, but since the obvious Google query doesn't yield anything canonical, I decided to cook up a short explanation.

For background information read Charles Miller's concise explanation and motivation on Conditional GETs in HTTP.

(The whole source code used in this example as one file.)

First, we need to retrieve the modification information, the ETag and Last-Modified header values, of the HTTP resource:

URL = "http://www.hole.fi/jajvirta/weblog/"
req = urllib2.Request(URL)
url_handle = urllib2.urlopen(req)
headers = url_handle.info()
 
etag = headers.getheader("ETag")
last_modified = headers.getheader("Last-Modified")

Now that we have the modification information we can use them in succeeding requests. Additionally, we need an error handler for the request, because the server responses with a status code 304, if the web page has not been modified. So we set up an error handler based on urllib2's BaseHandler. The handler looks like this:

class NotModifiedHandler(urllib2.BaseHandler):
  
    def http_error_304(self, req, fp, code, message, headers):
        addinfourl = urllib2.addinfourl(fp, headers, req.get_full_url())
        addinfourl.code = code
        return addinfourl

The handler is called on HTTP status code 304. If that's the case, we make a fake URL-handle with urllib2's addinfourl class, which we pass back to the caller, so that we can process the result of the open() like it were a usual request. We also add the status code to the fake URL handle.

Next we make use of these facilities in another request. This time we use a OpenerDirector builder and we add a couple of headers to the request:

req = urllib2.Request(URL)
if etag:
    req.add_header("If-None-Match", etag)
  
if last_modified:
    req.add_header("If-Modified-Since", last_modified)
 
opener = urllib2.build_opener(NotModifiedHandler())
url_handle = opener.open(req)
headers = url_handle.info() # the addinfourls have the .info() too
 
if hasattr(url_handle, 'code') and url_handle.code == 304:
    print "the web page has not been modified"

Now the program should print out that "the web page has not been modified".

In this example the two requests were made in the same run of the program so the values of the ETag and Last-Modified were kept in Python strings. Typically you need to do this between different runs of the program and therefore you need to store the modification information (for example) on the disk.

Read: How to do a Conditional HTTP GET with Python urllib2

Previous Topic

Next Topic


	Web Artima.com