The Artima Developer Community
Sponsored Link

Java Buzz Forum
Hadoop and Mike Cannon Brookes on using Lucene for Data rather than Text

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
dion

Posts: 5028
Nickname: dion
Registered: Feb, 2003

Dion Almaer is the Editor-in-Chief for TheServerSide.com, and is an enterprise Java evangelist
Hadoop and Mike Cannon Brookes on using Lucene for Data rather than Text Posted: Mar 22, 2007 8:58 PM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by dion.
Original Post: Hadoop and Mike Cannon Brookes on using Lucene for Data rather than Text
Feed Title: techno.blog(Dion)
Feed URL: http://feeds.feedburner.com/dion
Feed Description: blogging about life the universe and everything tech
Latest Java Buzz Posts
Latest Java Buzz Posts by dion
Latest Posts From techno.blog(Dion)

Advertisement

Mike kindly started the presentation with a consuming warning, letting us know in advance that he was going to be pimping JIRA (because this was going to be case study-esque).

These days JIRA uses Lucene for "Generic Data Indexing": Fast retrieval of complex data object. This isn't about text searching for "dog" sorted by relevance. The statistic pages all come back from a Lucene index, not from the DB.

Lucene has a way for you to write your own Sort routines via Sort, SortField.

I have seen the "viral Lucene" pattern apply in a variety of projects. You start out using it for /search, and then you see that you can use it for other things. Slowly your DB is doing less, and your Lucene indexes are growing. This is a killer open source project, even if the API is a little weird.

Hadoop: Open Source MapReduce

I had a couple of people ask "why Google hasn't open sourced our MapReduce?" They didn't know about Hadoop:

Hadoop is a framework for running applications on large clusters of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named map/reduce, where the application is divided into many small fragments of work, each of which may be executed or reexecuted on any node in the cluster. In addition, it provides a distributed file system that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both map/reduce and the distributed file system are designed so that node failures are automatically handled by the framework.

The intent is to scale Hadoop up to handling thousand of computers. Hadoop has been tested on clusters of 600 nodes.

Hadoop is a Lucene sub-project that contains the distributed computing platform that was formerly a part of Nutch. This includes the Hadoop Distributed Filesystem (HDFS) and an implementation of map/reduce.

For more information about Hadoop, please see the Hadoop wiki.

The great efforts of Christophe Bisciglia of the open source group revolve around UW classes where Hadoop is used in the curriculum.

Read: Hadoop and Mike Cannon Brookes on using Lucene for Data rather than Text

Topic: First-Class Methods: Java-style closures - v0.4 Previous Topic   Next Topic Topic: [Mar 19, 2007 12:31 PDT] 14 Links

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use