This post originated from an RSS feed registered with Python Buzz
by Ian Bicking.
Original Post: Python HTML Parser Performance
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
In preparation for my PyCon talk on HTML I thought I’d do a performance comparison of several parsers and document models.
The situation is a little complex because there’s different steps in handling HTML:
Parse the HTML
Parse it into something (a document object)
Serialize it
Some libraries handle 1, some handle 2, some handle 1, 2, 3, etc. [...]