This post originated from an RSS feed registered with Java Buzz
by Nick Lothian.
Original Post: Move the code to the data
Feed Title: BadMagicNumber
Feed URL: http://feeds.feedburner.com/Badmagicnumber
Feed Description: Java, Development and Me
The paper discusses the challenges around working with multi-petrabyte scientific datasets. There are some interesting approaches discussed (including Google's Map Reduce of courser).
However, I often wonder why this little gem from Dan Creswell never got more attention. He has written extensions to JavaSpaces that allow the code (think queries & processing instructions) to migrate to the data, instead of the code trying to download the data and process it. In a prototype system:
I then ran a test with each version submitting ten objects and then removing them from the queue. The code-uploading version was nearly 7 times as fast and I'm certain that as concurrency increases, the performance gap will get greater still as contention increases