The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
libxml-ruby 1.1.3 - Boosting Performance

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page


Posts: 201
Nickname: cfis
Registered: Mar, 2006

Charlie Savage
libxml-ruby 1.1.3 - Boosting Performance Posted: Mar 21, 2009 11:11 PM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by .
Original Post: libxml-ruby 1.1.3 - Boosting Performance
Feed Title: cfis
Feed URL: http://cfis.savagexi.com/articles.rss
Feed Description: Charlie's Blog
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by
Latest Posts From cfis

Advertisement

I'm happy to announce the release of libxml-ruby 1.1.3. Besides including the usual assortment of new features and bug fixes, this release also includes a speed boost of roughly 10% to 20%.

This resulted from RubyInside's recent post summarizing the performance of Ruby parsers. As expected, libxml-ruby blew away Hpricot and REXML in pure parsing speed (which of course is a simplistic view of what is important in an xml processor, but nevertheless still important) . But it consistently finished a bit behind Nokogiri.

I was a bit surprised by that since libxml-ruby and Nokogiri use the libxml2 library as their parsing engine. Since the specific test cases almost exclusively tested parsing, the two extensions should have identical run times.

Since the times were different, then the obvious conclusion was that the two extensions were using different libxml2 APIs or using different settings. I suspected the second, but when investigating performance you never know beforehand.

Not to bore everyone with the nitty-gritty details of using libxml2, but when looking into the first test, parsing an in-memory string, it didn't look there was much difference in API calls.

For libxml-ruby:

xmlCreateMemoryParserCtxt
xmlParseDocument

For Nokogiri:

xmlReadMemory
  -> xmlCreateMemoryParserCtxt
  -> xmlDoRead
     -> xmlParseDocument

So that didn't solve the mystery.

The next possibility was xmlDoRead was modifying the libxml2 parser context. Now a libxml2 parser context is a beast of a thing - for those brave souls who want to take a peek, its defined in libxml2's online documentation.

Working through the options one-by-one, I finally found the culprit, an obscure field in the structure:

int	dictNames	: Use dictionary names for the tree

What this setting controls is whether libxml2 uses a dictionary to cache strings it has previously parsed. Caching strings makes a big difference, so by default it should be enabled. That is now the case with libxml-ruby 1.2.3 and higher.

Rerunning the published benchmarks now shows libxml-ruby and Nokogiri to have equivalent performance. If you run the tests yourself, beware though. The order in which the extensions are tested changes the results. Whichever extension is tested first will always be faster, at least on my Fedora 10 box. I assume that's because the first parser has more memory available to it when the test begins and therefore invokes Ruby's garbage collector a few times less.

Read: libxml-ruby 1.1.3 - Boosting Performance

Topic: May 2009 Speaking Schedule Previous Topic   Next Topic Topic: Twitter GeekVids

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use