The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Google's Map-Reduce

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Keith Ray

Posts: 658
Nickname: keithray
Registered: May, 2003

Keith Ray is multi-platform software developer and Team Leader
Google's Map-Reduce Posted: May 23, 2005 7:05 AM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by Keith Ray.
Original Post: Google's Map-Reduce
Feed Title: MemoRanda
Feed URL: http://homepage.mac.com/1/homepage404ErrorPage.html
Feed Description: Keith Ray's notes to be remembered on agile software development, project management, oo programming, and other topics.
Latest Agile Buzz Posts
Latest Agile Buzz Posts by Keith Ray
Latest Posts From MemoRanda

Advertisement

I went to Google's open house on May 5 (which happened to be 5/05/2005, but that's probably not significant) and listened to a very interesting set of lectures (and a genuine professional "Engineer/Comedian"). One of the interesting bits is a simple algorithmic infrastructure that Google uses for the index of web-pages, and some other purposes: the "MapReduce" programming model.

Google indexes the web using a sequence of 24 MapReduce operations. These operations are specified by a custom high level interpreted programming langauge, and distributed across thousands of rack-mounted dual-processor PCs. Using their fault-tolerant parallel processing infrastructure (implemented in C++).

The input data is a few thousand terabytes of data represented as files in Google's custom distributed/fault tolerant file system. The output and intermediate output data are several hundred terabytes.

As you might expect, the system is i/o bound -- disk i/o and network i/o. Thus the overhead of an interpreted language for map-reduce operations is not the constraint when doing these big jobs. They have to build their own custom gigabit ethernet switches to handle a thousand (or more) rack-mounted PCs -- commercial ethernet switches top out at 64 PCs and cost too much.

Their system has to be fault tolerant, since having at least one PC die (or otherwise go out of commission) each day is not unusual for each of their data-centers.

Read: Google's Map-Reduce

Topic: Class Creation question Previous Topic   Next Topic Topic: Industry Expert Round Table

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use