The Artima Developer Community
Sponsored Link

Python Buzz Forum
Check a Google Sitemap for bad URLs

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Aaron Brady

Posts: 576
Nickname: insommeuk
Registered: Aug, 2003

Aaron Brady is lead developer for Crestsource
Check a Google Sitemap for bad URLs Posted: Aug 4, 2010 3:34 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Aaron Brady.
Original Post: Check a Google Sitemap for bad URLs
Feed Title: insom.me.uk
Feed URL: http://feeds2.feedburner.com/insommeuk
Feed Description: Posts related to using Python. Some tricks and tips, observations, hacks, and the Brand New Things.
Latest Python Buzz Posts
Latest Python Buzz Posts by Aaron Brady
Latest Posts From insom.me.uk

Advertisement

Create cagsmfbu.py:

import sys
import httplib2
import xml.dom.minidom as md

H = httplib2.Http()
X = md.parse(open(sys.argv[1]))
locs = X.getElementsByTagName("loc")
for loc in locs:
    url = loc.childNodes[0].nodeValue.encode('u8')
    try:
        res, content = H.request(url)
        print "%s\t%d" % (url, res.status)
    except:
        print "%s\tTOOMANY" % url
    sys.stdout.flush()

And then

% python cagsmfbu.py sitemap.xml | tee output.tdf

Read: Check a Google Sitemap for bad URLs

Topic: Check a Google Sitemap for bad URLs Previous Topic   Next Topic Topic: The tech of the new SourceForge

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use