The Artima Developer Community
Sponsored Link

Java Buzz Forum
Reverse PageRank from IBM

0 replies.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a flat view of this topic  Flat View
Previous Topic   Next Topic
Threaded View: This topic has 0 replies on 1 page
Michael Cote

Posts: 10306
Nickname: bushwald
Registered: May, 2003

Cote is a programmer in Austin, Texas.
Reverse PageRank from IBM Posted: Feb 5, 2004 5:49 AM
Reply to this message Reply

This post originated from an RSS feed registered with Java Buzz by Michael Cote.
Original Post: Reverse PageRank from IBM
Feed Title: Cote's Weblog: Coding, Austin, etc.
Feed URL: https://cote.io/feed/
Feed Description: Using Java to get to the ideal state.
Latest Java Buzz Posts
Latest Java Buzz Posts by Michael Cote
Latest Posts From Cote's Weblog: Coding, Austin, etc.

That theory suggests that the best way to find information on the Web is to look at the biggest and most popular sites and Web pages. Hubs, for example, are usually defined as Web portals and expert communities. Similarly, the concept of authorities rests on identifying the most important Web pages, including looking at the number and influence of other pages that link to them. The latter concept is mirrored in Google's main algorithm, called PageRank.

IBM applied the same concepts in an early Web data-mining project called Clever, but shortcomings eventually led researchers to turn the theory of hubs and authorities on its head. In short, IBM found that it could excavate more interesting data from pages that the theory of hubs and authorities normally pushed to the bottom of the heap--unstructured pages like discussion boards, Web logs, newsgroups and other pages. With that insight, WebFountain was born.

"We're looking at...the low-level grungy pages," said Gruhl.

The rest of the article is a complete dork-idea-fest: there's mentions of NLP, allusions to Ye Olde Semantic Web, reputation systems, and drool-inducing machine talk like the below,

A main cluster consists of 32 eight-server racks running dual 2.4GHz Intel Xeon processors, capable of writing 10GB of data per second to disk. Each rack has 5 terabytes of storage, for a total of 40 terabytes for the system.

The three clusters together currently run a total of 768 processors, and that number is growing fast.

The cluster and storage is migrating to blade servers this year, which will save space and provide a total of 896 processors for data mining and 256 for storage. In total, the system will add 1,152 processors, allowing it to collect and store as many as 8 billion Web pages within 24 hours.

Read: Reverse PageRank from IBM


Topic: Creativity Machine Previous Topic   Next Topic Topic: Ribs

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use