The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Aim for the Top! Beating the current #1 Wide Finder log analyzer.

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Aim for the Top! Beating the current #1 Wide Finder log analyzer. Posted: Nov 5, 2007 10:24 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Aim for the Top! Beating the current #1 Wide Finder log analyzer.
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

Yesterday's goal was to beat the fastest "Wide Finder" log analyzer program. I have been successful; my best effort is nearly 3 times faster than the current number 1 in Tim Bray's table on the two-core machine I tested it on. Not bad considering that I'm essentially fighting against highly optimized standard libraries written in C (string searching and regexp matching).

I took Ilmari Heikkinen's OCaml implementation (referred to as wf-kig below) and improved it progressively, adding multicore support using JoCaml's join-calculus in the last step (the concurrent core, a mere 12 lines, is explained below). Here follows a summary of the optimizations, but let's see the numbers first. These are my results on a Core 2 Duo 2GHz, 2GB DDR2 667MHz machine (hot cache)*1:

programreal(s)user(s)sys(s)
wf-kig1.391.1040.284
wf1.1160.8340.282
wf-block0.7620.4720.286
wf-mmap0.5840.4470.136
wf-mmap-multicore0.367??
wf-2.py2.0571.7770.278
wf-6.py (2)1.012??
wf-6.py (1)1.737??

wf-6.py is the current leader in Tim Bray's classification. It is multicore-enabled; the above table lists the times I got with one and two processes (unsurprisingly, further processes don't make it any faster because the test machine has got only two cores). wf-2.py is the basic Python implementation wf-6.py was evolved from and is about the same size as wf-kig.ml or wf.ml. My optimizations mostly mirror those applied to wf-2.py in order to create wf-6.py.

My fast "Wide Finder" implementations can be found in the darcs repository.

First improvement: sublinear string search

wf-kig.ml processes the input one line at a time, matching each one against the "GET /ongoing/When/\d\d\dx..." regexp. The first optimization I did was splitting that regexp matching into two parts:

  1. find the "GET /ongoing/When/" substring in the line
  2. try to match the remaining regexp at that position

This allows me to use a sublinear string search algorithm for the first part. I took the Boyer-Moore implementation I had written for ICFP07, achieving a 25% speed boost (this is "wf" in the above table). Note that wf-2.py already uses sublinear searching, so it is equivalent to "wf" modulo the implementation language.

Block processing

The profiler indicated nearly half of the time was spent in the function that reads a single line from disk. I changed my code to read 50MB chunks and scan them in one go, with some care to take the incomplete line at the end of each chunk and copy it at the beginning of the next one. This brings a 46% speed gain in wf-block relative to the previous version ("wf").

Avoiding IO

In order to avoid useless copying between kernel and user space, I switched to a memory-mapped solution. I wrote a reusable Bigstring module that allows to use mmap'ed memory as an OCaml string and adapted the Boyer-Moore string search implementation. wf-mmap.ml itself takes less code than wf-block.ml, and is about 30% faster.

There is still some potential for optimization in the single-core case because I have to copy the parts of the line to be matched against the regexp to a buffer, owing to a limitation in the interface of the regexp library. Of course, the ultimate optimization would be to specialize the code for the particular regexp we're using.

At this point, wf-mmap processes 200MB in 0.584s, and is nearly twice faster on a single core than wf-6.py using two. However, Tim's machine has got 8 cores, so I went for a multicore-enable version.

Distributed programming with JoCaml

I adapted wf-mmap.ml to use several processes managed using JoCaml's join calculus. wf-mmap-multicore processes 200MB in 0.367s when using 2 processes, considerably faster than wf-6.py.

The join calculus is really neat, it allows you to think about concurrent processes at a higher level with fewer chances to deadlock or corrupt your data.

This is the code that controls the workers (if you have some time follow the introduction to concurrent and distributed programming with JoCaml, I can't recommend it enough); it differs a bit from what you'll find in the actual sources, because I just thought of this more elegant way as I was writing these lines:


Read more...

Read: Aim for the Top! Beating the current #1 Wide Finder log analyzer.

Topic: Town Meeting with Matz at RubyConf 2007 Previous Topic   Next Topic Topic: RubyConf 2007 Friday Morning - Jim Weirich

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use