The Artima Developer Community
Sponsored Link

Weblogs Forum
Heron Tackles the WideFinder Challenge

4 replies on 1 page. Most recent reply: Mar 28, 2010 10:27 PM by Michael Chang

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 4 replies on 1 page
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Heron Tackles the WideFinder Challenge (View in Weblogs)
Posted: Mar 17, 2010 11:00 PM
Reply to this message Reply
Summary
Tim Bray's WideFinder project is to write a simple log-file parsing program that runs fast on modern CPUs with low clock rates but many cores. I decided to tackle it with the newest Heron release (version 1.0 Alpha 2).
Advertisement

The Widefinder project started with a captivating series of blog posts by Tim Bray at Sun Microsystems. Tim stated a desire for a programming language with the ease of use and elegance of Ruby, but which could automatically exploit available hardware parallelism. Tim's working example for the was a simple log file parsing script.

This challenge sparked a massive programming language shoot-out. Tim and many others started comparing various languages and implementations of the log-file processing script in terms of raw processing speed.

I decided to tackle Wide Finder using the latest version of Heron. Rather than trying to achieve sheer processing speed I first wanted to see if I could get it to work, after all Heron is still just in Alpha releases. Next I wanted to see what improvements in speed I could get in running the interpreter in multi-threaded mode.

First I have to admit that the current Heron interpreter is not very efficient, after all it is still in Alpha, so the results are rather unimpressive when looked at on their own. On my machine it took 9 seconds to parse a 1 megabyte log file with about 10,000 lines when running in single-threaded mode. To put Heron in single-thread mode you have to set maxthreads value in the config.xml file to 1. However, when I flip the mode to multi-threaded by setting maxthreads to -1, it takes 30% less time to execute on a dual-core laptop.

The reason I get this speed-up for free is because my implementation is heavily CPU bound and the bottle-neck is in the sort function. The sort I use relies heavily on Heron's select operation, which is an implicitly parallelized operation in Heron.

In Heron the select operation takes a list and a predicate expression. It returns a new list which contains members of the original list for which the predicate returns true. For example select (x from xs) x % 2 will return all even numbers from the list xs.

The WideFinder sample program and test data is available in the latest Heron release, which is now Heron 1.0 Alpha 2. Below is the Heron WideFinder implementation:

module WideFinder
{
    imports
    {
        sorting = new Heron.Standard.Sorting();
        console = new Heron.Windows.Console();
    }
    methods
    {
        Main()
        {
            var lines = File.ReadAllLines(@"testdata\wide-finder-test-data.txt");
            var dict = table(url:String, cnt:Int) { };
            var re = new Regex(@"GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ \.]+)");

            WriteLine("Counting entries");

            foreach (line in lines)
            {
                var m = re.Match(line);
                if (m.get_Success())
                {
                    var c = m.get_Groups().ToList()[1].get_Captures().ToList()[0];
                    var s = c.get_Value();
                    if (!dict.HasKey(s))
                        dict.Add([s, 1]); 
                    else
                        dict[s].cnt = dict[s].cnt + 1;
                }
            }

            WriteLine("Counted the entries");
            WriteLine("Sorting the keys");
            
            var keys = dict.GetColumn(0);
            keys = Sort(
                keys,
                function (a, b) { 
                    return dict[b].cnt - dict[a].cnt; 
                }); 
            WriteLine("Sorted the keys");

            foreach (i in 0..9)
            {
                var key = keys[i];
                Console.WriteLine(key + " = " + dict[key].ToString());
            }
        }
    }
}

module Heron.Standard.Sorting
{
    methods
    {
        Sort(xs : List, compare : Function) : List 
        {
            if (xs.Count() < 2) 
                return xs;
            
            if (xs.Count() == 2)
                if (compare(xs[0], xs[1]) <= 0)
                    return xs; else
                    return [xs[1], xs[0]];

            var pivot = xs[0];
            var tail = xs.Slice(1, xs.Count() - 1);
            
            var as = select (x from tail) 
                compare(x, pivot) <= 0;

            var bs = select (x from tail) 
                compare(x, pivot) > 0;

            as = Sort(as, compare);
            bs = Sort(bs, compare);
            var r = as.ToList();
            r.Add(pivot);
            r.AddRange(bs);
            return r;
        }
    }
}

While it would be easy to find a number of flaws with my implementation, I think this this is still an interesting proof of concept. For me it demonstrates how parallelized list operations built into a language can be effective at leveraging multiple cores, without requiring extra work from a programmer.


James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Heron Tackles the WideFinder Challenge Posted: Mar 18, 2010 5:17 PM
Reply to this message Reply
> The reason I get this speed-up for free is because my
> implementation is heavily CPU bound and the bottle-neck is
> in the sort function. The sort I use relies heavily on
> Heron's <code>select</code> operation, which is an
> implicitly parallelized operation in Heron.
>
> In Heron the <code>select</code> operation takes a list
> and a predicate expression. It returns a new list which
> contains members of the original list for which the
> predicate returns true. For example <code>select (x from
> xs) x % 2</code> will return all even numbers from the
> list <code>xs</code>.

Just curious... Doesn't this implicit parallelism require that lists always support random access? I'm not sure if lists are inherently built-in or if new list types may be defined. I think you said that Heron is Python inspired and therefore I might expect to be able to pass in any 'list-like' object to select.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Heron Tackles the WideFinder Challenge Posted: Mar 18, 2010 9:42 PM
Reply to this message Reply
> Just curious... Doesn't this implicit parallelism require
> that lists always support random access?

Actually no. While it is easy to convert list-like things to something that supports random access, you can achieve parallelism by iterating over items in a sequence and sending them to different tasks (modulo the number of threads).

> I'm not sure if
> lists are inherently built-in

Yes.

> or if new list types may be
> defined. I think you said that Heron is Python inspired
> and therefore I might expect to be able to pass in any
> 'list-like' object to select.

So to be more precise the "list" operators in Heron (map, select, reduce, and accumulate) accepts what are really called sequences. Sequences are really interfaces that support two functions: ToList() and ToIterator(). Some examples of sequences are lists, arrays, and ranges.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Heron Tackles the WideFinder Challenge Posted: Mar 19, 2010 11:04 AM
Reply to this message Reply
> So to be more precise the "list" operators in Heron (map,
> select, reduce, and accumulate) accepts what are really
> called sequences. Sequences are really interfaces that
> support two functions: ToList() and ToIterator(). Some
> examples of sequences are lists, arrays, and ranges.

So would a custom type that supports ToList() be a valid target for the select operator?

Michael Chang

Posts: 2
Nickname: syndetic
Registered: Dec, 2008

Re: Heron Tackles the WideFinder Challenge Posted: Mar 28, 2010 10:27 PM
Reply to this message Reply
What I really like about this post has nothing to do with threading or efficiency but, rather, this Heron language. I had never seen or heard of it before until just now. I have to say, I kinda like it. I like how the imports and the methods are tucked away inside their own little groups surrounded by brackets. I do the same thing in other languages using comment characters for better readability.

Flat View: This topic has 4 replies on 1 page
Topic: Writing Software is Like ... Writing Previous Topic   Next Topic Topic: Software Development Has Stalled


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us