Heron-Centric: Ruminations of a Language Designer
Heron Tackles the WideFinder Challenge
by Christopher Diggins
March 18, 2010

Summary
Tim Bray's WideFinder project is to write a simple log-file parsing program that runs fast on modern CPUs with low clock rates but many cores. I decided to tackle it with the newest Heron release (version 1.0 Alpha 2).

The Widefinder project started with a captivating series of blog posts by Tim Bray at Sun Microsystems. Tim stated a desire for a programming language with the ease of use and elegance of Ruby, but which could automatically exploit available hardware parallelism. Tim's working example for the was a simple log file parsing script.

This challenge sparked a massive programming language shoot-out. Tim and many others started comparing various languages and implementations of the log-file processing script in terms of raw processing speed.

I decided to tackle Wide Finder using the latest version of Heron. Rather than trying to achieve sheer processing speed I first wanted to see if I could get it to work, after all Heron is still just in Alpha releases. Next I wanted to see what improvements in speed I could get in running the interpreter in multi-threaded mode.

First I have to admit that the current Heron interpreter is not very efficient, after all it is still in Alpha, so the results are rather unimpressive when looked at on their own. On my machine it took 9 seconds to parse a 1 megabyte log file with about 10,000 lines when running in single-threaded mode. To put Heron in single-thread mode you have to set maxthreads value in the config.xml file to 1. However, when I flip the mode to multi-threaded by setting maxthreads to -1, it takes 30% less time to execute on a dual-core laptop.

The reason I get this speed-up for free is because my implementation is heavily CPU bound and the bottle-neck is in the sort function. The sort I use relies heavily on Heron's select operation, which is an implicitly parallelized operation in Heron.

In Heron the select operation takes a list and a predicate expression. It returns a new list which contains members of the original list for which the predicate returns true. For example select (x from xs) x % 2 will return all even numbers from the list xs.

The WideFinder sample program and test data is available in the latest Heron release, which is now Heron 1.0 Alpha 2. Below is the Heron WideFinder implementation:

module WideFinder
{
    imports
    {
        sorting = new Heron.Standard.Sorting();
        console = new Heron.Windows.Console();
    }
    methods
    {
        Main()
        {
            var lines = File.ReadAllLines(@"testdata\wide-finder-test-data.txt");
            var dict = table(url:String, cnt:Int) { };
            var re = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ \.]+)");

            WriteLine("Counting entries");

            foreach (line in lines)
            {
                var m = re.Match(line);
                if (m.get_Success())
                {
                    var c = m.get_Groups().ToList()[1].get_Captures().ToList()[0];
                    var s = c.get_Value();
                    if (!dict.HasKey(s))
                        dict.Add([s, 1]); 
                    else
                        dict[s].cnt = dict[s].cnt + 1;
                }
            }

            WriteLine("Counted the entries");
            WriteLine("Sorting the keys");
            
            var keys = dict.GetColumn(0);
            keys = Sort(
                keys,
                function (a, b) { 
                    return dict[b].cnt - dict[a].cnt; 
                }); 
            WriteLine("Sorted the keys");

            foreach (i in 0..9)
            {
                var key = keys[i];
                Console.WriteLine(key + " = " + dict[key].ToString());
            }
        }
    }
}

module Heron.Standard.Sorting
{
    methods
    {
        Sort(xs : List, compare : Function) : List 
        {
            if (xs.Count() < 2) 
                return xs;
            
            if (xs.Count() == 2)
                if (compare(xs[0], xs[1]) <= 0)
                    return xs; else
                    return [xs[1], xs[0]];

            var pivot = xs[0];
            var tail = xs.Slice(1, xs.Count() - 1);
            
            var as = select (x from tail) 
                compare(x, pivot) <= 0;

            var bs = select (x from tail) 
                compare(x, pivot) > 0;

            as = Sort(as, compare);
            bs = Sort(bs, compare);
            var r = as.ToList();
            r.Add(pivot);
            r.AddRange(bs);
            return r;
        }
    }
}

While it would be easy to find a number of flaws with my implementation, I think this this is still an interesting proof of concept. For me it demonstrates how parallelized list operations built into a language can be effective at leveraging multiple cores, without requiring extra work from a programmer.

Talk Back!

Have an opinion? Readers have already posted 4 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Christopher Diggins adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

Christopher Diggins is a software developer and freelance writer. Christopher loves programming, but is eternally frustrated by the shortcomings of modern programming languages. As would any reasonable person in his shoes, he decided to quit his day job to write his own ( www.heron-language.com ). Christopher is the co-author of the C++ Cookbook from O'Reilly. Christopher can be reached through his home page at www.cdiggins.com.


	Web Artima.com