The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Automatic OPML reading list from Tech Memeorandum

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Adam Green

Posts: 102
Nickname: darwinian
Registered: Dec, 2005

Adam Green is the author of Ruby.Darwinianweb.com
Automatic OPML reading list from Tech Memeorandum Posted: Feb 5, 2006 6:45 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Adam Green.
Original Post: Automatic OPML reading list from Tech Memeorandum
Feed Title: ruby.darwinianweb.com
Feed URL: http://www.nemesis-one.com/rss.xml
Feed Description: Adam Green's Ruby development site
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Adam Green
Latest Posts From ruby.darwinianweb.com

Advertisement
This new script reads my XML list of Tech Memeorandum blogs and generates an OPML file with the matching RSS feeds. I do some simple RSS feed autodiscovery in each blog to find the feed. What this latest project really proves is that I don't actually know how to use regular expressions. If I did, I could probably condense this script down to a few lines. Anyway, it works, and maybe I'll get around to reading up on regex someday. The script is running on my mashup blog, and the resulting OPML file is updated every hour and placed here. You can grab the file once or subscribe to it as a reading list in your aggregator. Please let me know if you make any improvements to the script.

tmopml.rb

 #! /usr/bin/ruby
# tmopml.rb
# Create an OPML reading list based on a list of
# blogs cited on http://tech.memeorandum.com.
#
# Copyright (C) 2006 Adam Green
# http://mashup.darwinianweb.com, adam AT darwinianweb DOT com
# This program is distributed under the same license as Ruby.
#
require "open-uri"
require "rexml/document"
include REXML



# Create the OPML file.
opmlfile = File.new("public_html/projects/tmblogs/tmopml.xml", "w")
opmlfile.puts('<?xml version="1.0" encoding="UTF-8"?>')
opmlfile.puts('<opml version="1.1">')
opmlfile.puts(' <head>')
opmlfile.puts(' <title>Tech Memeorandum Reading List</title>')
opmlfile.puts(' <dateCreated>' + Time.now.rfc2822 + '</dateCreated>')
opmlfile.puts(' <ownerName>Adam Green - Mashup.Darwinianweb.com</ownerName>')
opmlfile.puts(' </head>')
opmlfile.puts(' <body>')



# Open the list of blogs maintained at
# http://mashup.darwinianweb.com/projects/tmblogs/tmblogs.xml
doc = Document.new(File.read("public_html/projects/tmblogs/tmblogs.xml"))
doc.elements.each("tmblogs/blog") do |blog|
title = blog.elements["title"].text
htmlurl = blog.elements["htmlUrl"].text



begin
# Get the blog page's text.
page = open(htmlurl)
pagetext = page.read
page.close



# Pull out all the link tags.
feedfound = false
pagetext.scan(/<link.*?>/i).each do |tag|



# clean up the tag.
tag = tag.delete(" ")
tag = tag.downcase



# Find the first feed link.
if tag.match(('rel=\"alternate\"') &&
('application\/rss\+xml'||'text\/xml'||'application\/atom\+xml'||'application\/x.atom\+xml'||'application\/x-atom\+xml') ) &&
(not feedfound)



# Extract the feed's URL
matchdata = /href=\".*?\"/.match(tag)
matchstr = matchdata.to_s
xmlurl = matchstr[6..matchstr.length-2]
feedfound = true
puts title, htmlurl
opmlfile.puts(' <outline type="rss" text="' + title + '" xmlUrl="' + xmlurl + '" htmlUrl="' + htmlurl + '" />')



end
end



* Trap time out errors.
rescue Exception
puts title, 'timeout'
end
end



opmlfile.puts(' </body>')
opmlfile.puts('</opml>')
opmlfile.close

Read: Automatic OPML reading list from Tech Memeorandum

Topic: In Pune tomorrow Previous Topic   Next Topic Topic: Markaby 0.3

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use