Java Buzz Forum - Reverse PageRank from IBM

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Buzz Forum
Reverse PageRank from IBM

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Michael Cote

Posts: 10306
Nickname: bushwald
Registered: May, 2003

Cote is a programmer in Austin, Texas.

Reverse PageRank from IBM

Posted: Feb 5, 2004 5:49 AM

This post originated from an RSS feed registered with Java Buzz by Michael Cote.
Original Post: Reverse PageRank from IBM Feed Title: Cote's Weblog: Coding, Austin, etc. Feed URL: https://cote.io/feed/ Feed Description: Using Java to get to the ideal state.	Latest Java Buzz Posts Latest Java Buzz Posts by Michael Cote Latest Posts From Cote's Weblog: Coding, Austin, etc.

That theory suggests that the best way to find information on the Web is to look at the biggest and most popular sites and Web pages. Hubs, for example, are usually defined as Web portals and expert communities. Similarly, the concept of authorities rests on identifying the most important Web pages, including looking at the number and influence of other pages that link to them. The latter concept is mirrored in Google's main algorithm, called PageRank.

IBM applied the same concepts in an early Web data-mining project called Clever, but shortcomings eventually led researchers to turn the theory of hubs and authorities on its head. In short, IBM found that it could excavate more interesting data from pages that the theory of hubs and authorities normally pushed to the bottom of the heap--unstructured pages like discussion boards, Web logs, newsgroups and other pages. With that insight, WebFountain was born.

"We're looking at...the low-level grungy pages," said Gruhl.

The rest of the article is a complete dork-idea-fest: there's mentions of NLP, allusions to Ye Olde Semantic Web, reputation systems, and drool-inducing machine talk like the below,

A main cluster consists of 32 eight-server racks running dual 2.4GHz Intel Xeon processors, capable of writing 10GB of data per second to disk. Each rack has 5 terabytes of storage, for a total of 40 terabytes for the system.

The three clusters together currently run a total of 768 processors, and that number is growing fast.

The cluster and storage is migrating to blade servers this year, which will save space and provide a total of 896 processors for data mining and 256 for storage. In total, the system will add 1,152 processors, allowing it to collect and store as many as 8 billion Web pages within 24 hours.

Read: Reverse PageRank from IBM

Previous Topic

Next Topic


	Web Artima.com