Featured Research

from universities, journals, and other organizations

Sifting Through The Jumble: A Cornell Researcher Finds A New Way Of Retrieving Just The Right Information From The Web

Date:
April 20, 1998
Source:
Cornell University
Summary:
The World Wide Web is an endless source of information, but with literally millions of pages posted by everyone from governments, universities and corporations to sixth-graders and conspiracy theorists, it's getting harder and harder to find precisely the "right" information.

ITHACA, N.Y. -- The World Wide Web is an endless source of information, butwith literally millions of pages posted by everyone from governments,universities and corporations to sixth-graders and conspiracy theorists,it's getting harder and harder to find precisely the "right" information.

Now a Cornell University researcher has come up with a method of searchingthe web that can return a list of the most valuable sites on a given topic,as well as a list of sites that index the subject. Early tests of themethod have produced highly focused lists of sites on many topics, oftencomparable to lists carefully compiled by web search experts.

The method was developed by Jon Kleinberg, Cornell professor of computerscience. An evaluation of the method was presented at the seventhInternational World Wide Web Conference held April 14-18 in Brisbane,Australia, in a paper by Kleinberg, David Gibson of the Department ofComputer Science, University of California at Berkeley, and several IBMresearchers.

Popular web-searching tools, known as engines, such as Yahoo! andAltaVista, work by hunting for keywords in the text of web pages. On sometopics this can return hundreds or even thousands of pages. The algorithm(a set of rules specifying how to solve the problem) developed by Kleinberginstead works by analyzing the way web pages are linked to one another. Theassumption behind this is that the most authoritative pages on a givensubject will be those that are most often pointed to by other pages.

The web is annotated with "precisely the type of human judgment we need toidentify authority," Kleinberg explains. "It almost says something aboutthe way the web has evolved. I think it's about the way people linkinformation in general, not just on the web."

Kleinberg's method does more than just identify pages with usefulinformation about a topic, which he calls "authorities." The method alsolooks for pages that contain many links to pages with useful information onthe topic, which he calls "hubs."

The best authorities, Kleinberg says, will be those that point to the besthubs, and the best hubs will be the ones that point to the bestauthorities. Kleinberg prevents this from becoming a circular definition byrecalculating the relationship several times, each time moving closer tothe ideal result.

He has written a search program using this technique called HITS (forHyperlink-Induced Topic Search). HITS begins by conducting an ordinarytext-based search on a topic using a search engine such as AltaVista. Thiscollects a "root set" of about 200 pages that contain the entered keywords.It then expands the set to include all the pages linked to by pages in theroot set. The expanded set might include from 1,000 to 3,000 pages.

>From there on, text is ignored, and the application only looks at the waypages in the expanded set are linked to one another. The first timethrough, it identifies the pages that are pointed to most often by otherpages, and assigns them a score, or "weight," indicating that they are morelikely to be authorities. At the same time it notes the pages that containmore links to other pages and gives them more weight as hubs.

This calculation is repeated several times. Each time the program givesmore authority weight to sites that link to sites with more hub weight, andmore hub weight to sites that link to sites with more authority weight.Ten repetitions, Kleinberg says, are enough to return surprisingly focusedlists of authorities and hubs.

The system overcomes several of the problems frequently identified withtext-based searches. For example, at one time a text-based search for"Gates" didn't return the Microsoft Corp. home page because Microsoftchairman Bill Gates wasn't mentioned on the opening page. (He still isn't,but now his biography can be found by following the link "AboutMicrosoft.") A search for "jaguar" returns a jumble of pages about cars,animals, the Jacksonville Jaguars NFL team, and the obsolete but stillmuch-discussed Atari Jaguar computer.

In a case where a word represents more than one topic, Kleinberg's methodautomatically separates sites into "communities" of hubs and authorities,each representing one of the possible topics. Thus a HITS search on"jaguar" lists first a community of sites related to the Jaguar computer,because the number of web sites on this subject predominate. Further down,it listed communities relating to the football team and the car. Finally itfinds sparse information relating to the animal, because this topic issimply not well represented on the web, Kleinberg says.

Communities also form when a topic is polarized: A search on "abortion"returns separate communities of pro-life and pro-choice sites, because thesites within each community link more densely to one other than to sitesadvocating an opposing view.

One disadvantage of the method, Kleinberg says, is that it doesn't alwayswork for sharply focused queries. A search for "Netscape 4.04," forexample, returns a general list of sites about web browsers.

The paper being presented in Brisbane is titled "Automatic Resource ListCompilation by Analyzing Hyperlink Structure and Associated Text." Anotherpaper by Kleinberg, "Authoritative Sources in a Hyperlinked Environment,"was published in the Proceedings of the 9th ACM-SIAM Symposium on DiscreteAlgorithms, 1998. A related paper, "Inferring Web Communities from LinkTopology," by Kleinberg, Gibson and Prabhakar Raghavan of the IBM AlmadenResearch Center, appears in the Proceedings of the 9th ACM Conference onHypertext and Hypermedia, 1998.

The texts of these papers can be found on Kleinberg's web page athttp://www.cs.cornell.edu/home/kleinber/.

Kleinberg developed the method while working as a visiting scientist atIBM's Almaden Research Center, on leave from Cornell. IBM has applied fora patent on the algorithm.


Story Source:

The above story is based on materials provided by Cornell University. Note: Materials may be edited for content and length.


Cite This Page:

Cornell University. "Sifting Through The Jumble: A Cornell Researcher Finds A New Way Of Retrieving Just The Right Information From The Web." ScienceDaily. ScienceDaily, 20 April 1998. <www.sciencedaily.com/releases/1998/04/980420080706.htm>.
Cornell University. (1998, April 20). Sifting Through The Jumble: A Cornell Researcher Finds A New Way Of Retrieving Just The Right Information From The Web. ScienceDaily. Retrieved September 20, 2014 from www.sciencedaily.com/releases/1998/04/980420080706.htm
Cornell University. "Sifting Through The Jumble: A Cornell Researcher Finds A New Way Of Retrieving Just The Right Information From The Web." ScienceDaily. www.sciencedaily.com/releases/1998/04/980420080706.htm (accessed September 20, 2014).

Share This



More Computers & Math News

Saturday, September 20, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

How Much Privacy Protection Will Google's Android L Provide?

How Much Privacy Protection Will Google's Android L Provide?

Newsy (Sep. 19, 2014) Google's local encryption will make it harder for law enforcement or malicious actors to access the contents of devices running Android L. Video provided by Newsy
Powered by NewsLook.com
Virtual Reality Headsets Unveiled at Tokyo Game Show

Virtual Reality Headsets Unveiled at Tokyo Game Show

AFP (Sep. 18, 2014) Several companies unveiled virtual reality headsets at the Tokyo Game Show, Asia's largest digital entertainment exhibition. Duration: 00:48 Video provided by AFP
Powered by NewsLook.com
What HealthKit Bug Means For Your iOS Fitness Apps

What HealthKit Bug Means For Your iOS Fitness Apps

Newsy (Sep. 18, 2014) Apple has delayed the launch of the HealthKit app platform, citing a bug. Video provided by Newsy
Powered by NewsLook.com
Apple's iOS8 Includes New 'Killswitch' To Curb Theft

Apple's iOS8 Includes New 'Killswitch' To Curb Theft

Newsy (Sep. 18, 2014) Apple's new operating system, iOS 8, comes with Apple's killswitch feature already activated, unlike all the models before it. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins