The Artima Developer Community
Sponsored Link

Java Community News
IBM Releases MapReduce Tools

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

IBM Releases MapReduce Tools Posted: Mar 29, 2007 3:01 PM
Reply to this message Reply
Summary
A new IBM alphaWorks tool makes it easier to work with the popular distributed computing technique MapReduce and the open-source Apache MapReduce implementation, Hadoop.
Advertisement

MapReduce is a distributed computing technique popularized by Google: It extends the functional programing constructs map and reduce with the ability for parallel execution across a compute cluster. While map iterates over elements of a collection, performing some function on each element, reduce computes a single value from collection elements. Map, and to a lesser extent reduce, operations can be performed in parallel, increasing the speed of both operations.

Hadoop, an Apache project, is an open-source implementation of the MapReduce technique, and is also a distributed computing framework built around MapReduce. With its origins in the Lucene distributed file system, Hadoop has the ability to execute map operations on large files by automatically splitting a large file into smaller segements:

As the Map operation is parallelized the input file set is first split to several pieces called FileSplits. If an individual file is so large that it will affect seek time it will be split to several Splits. The splitting does not know anything about the input file's internal logical structure, for example line-oriented text files are split on arbitrary byte boundaries. Then a new map task is created per FileSplit.

Hadoop provides several other facilities as well for operating on large amounts of data, such as search indexes, in a distributed fashion.

IBM's alphaWorks project recently released an Eclipse-based tool, MapReduce Tools, for working with MapReduce, and in particular with Hadoop. According to the project's documentation,

The plug-in automatically creates projects with the Hadoop libraries for development and testing. Templates for MapReduce drivers are also provided. After a project is completed, the plug-in uses SCP (secure copy) to deploy the code to a Hadoop server and then remotely executes it via SSH (secure shell). During execution, the plug-in communicates with the Hadoop task tracker via HTTP and displays the job status.

What do you think of Hadoop and IBM's new MapReduce Tools?

Topic: XQuery's Role in the Enterprise Previous Topic   Next Topic Topic: Using the Globus Grid Toolkit from Java

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use