The Artima Developer Community
Sponsored Link

Python Buzz Forum
How to do a Conditional HTTP GET with Python urllib2

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Jarno Virtanen

Posts: 109
Nickname: jajvirta
Registered: May, 2003

Jarno Virtanen is a university student for life, it seems, and a part time software developer
How to do a Conditional HTTP GET with Python urllib2 Posted: Sep 28, 2003 12:15 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Jarno Virtanen.
Original Post: How to do a Conditional HTTP GET with Python urllib2
Feed Title: Python owns us
Feed URL: http://sedoparking.com/search/registrar.php?domain=®istrar=sedopark
Feed Description: A weblog about Python from the view point of Jarno Virtanen.
Latest Python Buzz Posts
Latest Python Buzz Posts by Jarno Virtanen
Latest Posts From Python owns us

Advertisement

It's not rocket science, but since the obvious Google query doesn't yield anything canonical, I decided to cook up a short explanation.

For background information read Charles Miller's concise explanation and motivation on Conditional GETs in HTTP.

(The whole source code used in this example as one file.)

First, we need to retrieve the modification information, the ETag and Last-Modified header values, of the HTTP resource:

URL = "http://www.hole.fi/jajvirta/weblog/"
req = urllib2.Request(URL)
url_handle = urllib2.urlopen(req)
headers = url_handle.info()
 
etag = headers.getheader("ETag")
last_modified = headers.getheader("Last-Modified") 

Now that we have the modification information we can use them in succeeding requests. Additionally, we need an error handler for the request, because the server responses with a status code 304, if the web page has not been modified. So we set up an error handler based on urllib2's BaseHandler. The handler looks like this:

class NotModifiedHandler(urllib2.BaseHandler):
  
    def http_error_304(self, req, fp, code, message, headers):
        addinfourl = urllib2.addinfourl(fp, headers, req.get_full_url())
        addinfourl.code = code
        return addinfourl

The handler is called on HTTP status code 304. If that's the case, we make a fake URL-handle with urllib2's addinfourl class, which we pass back to the caller, so that we can process the result of the open() like it were a usual request. We also add the status code to the fake URL handle.

Next we make use of these facilities in another request. This time we use a OpenerDirector builder and we add a couple of headers to the request:

req = urllib2.Request(URL)
if etag:
    req.add_header("If-None-Match", etag)
  
if last_modified:
    req.add_header("If-Modified-Since", last_modified)
 
opener = urllib2.build_opener(NotModifiedHandler())
url_handle = opener.open(req)
headers = url_handle.info() # the addinfourls have the .info() too
 
if hasattr(url_handle, 'code') and url_handle.code == 304:
    print "the web page has not been modified"

Now the program should print out that "the web page has not been modified".

In this example the two requests were made in the same run of the program so the values of the ETag and Last-Modified were kept in Python strings. Typically you need to do this between different runs of the program and therefore you need to store the modification information (for example) on the disk.

Read: How to do a Conditional HTTP GET with Python urllib2

Topic: Random links and stuff Previous Topic   Next Topic Topic: Things and stuff...

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use