Science News

Search Engines Biased, Out-Of-Date, And Index No More Than 16% Of The Web

ScienceDaily (July 12, 1999) — A new NEC Research Institute study analyzes the accessibility and distribution of information on the web. The study was conducted by Dr. Steve Lawrence and Dr. C. Lee Giles and will appear in the July 8 issue of the journal Nature.

-- LOW COVERAGE -- Search engine coverage has decreased substantially since Dec. 97, with no engine indexing more than about 16% of the publicly indexable web.

-- UNEQUAL ACCESS -- Search engines are more likely to index sites that have more links to them (more 'popular' sites). They are also typically more likely to index US sites than non-US sites, and more likely to index commercial sites than educational sites.

-- OUT-OF-DATE -- Indexing of new or modified pages by just one of the major search engines can take months.

-- AMOUNT OF INFORMATION -- The publicly indexable web contains about 800 million pages encompassing about 15 terabytes of data (about 6 terabytes of textual content after removing HTML tags, comments, and extra whitespace); it also contains about 180 million images.

-- TYPE OF INFORMATION -- 83% of sites contain commercial content and 6% contain scientific/educational content. Only 1.5% of sites contain pornographic content.

The web is transforming society, and the search engines are an important part of the process. For example, consumers use search engines to locate and buy goods or to research many decisions (such as choosing a vacation destination, medical treatment or election vote).

Search engine indexing and ranking may have economic, social, political, and scientific effects. For example, indexing and ranking of online stores can substantially effect economic viability; delayed indexing of scientific research can lead to the duplication of work or slower progress; and delayed or biased indexing may affect social or political decisions.

One of the great promises of the web is to equalize access to information. As the web fast becomes a major communications medium, attention should be paid to the accessibility of information on the web, in order to minimize unequal access to information, and maximize the benefits of the web for society.

For more information see http://wwwmetrics.com.

###

The NEC Research Institute conducts long-term, fundamental research in computer and physical sciences. The mission of the Institute is to contribute significant new understanding of computer and communication (C&C) technologies for the future. Institute research activities have a long-term goal of significant advances in the understanding of intelligence and information processing in biological and machine systems, and in the physical and system aspects of future computer architectures.

Email or share this story:
| More

Story Source:

Adapted from materials provided by NEC Research Institute.

APA

MLA

Note: If no author is given, the source is cited instead.

Search ScienceDaily

Number of stories in archives: 77,945

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily's archives for related news topics,
the latest news stories, reference articles, science videos, images, and books.

 

Science Video News


Image Based Search Engine Created

VizSeek is one of the first search engines on the Internet to use a photograph, a 2D image, or a 3D model and transform it into a 3D shape. The. ...  > full story

Breaking News

... from NewsDaily.com

In Other News ...

Copyright Reuters 2008. See Restrictions.

Free Subscriptions

... from ScienceDaily

Get the latest science news with our free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Feedback

... we want to hear from you!

Tell us what you think of the new ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Post this page to your favorite social bookmarking site:
close
Include this item in your blog or web site:
close
Cite this article in your essay, paper, or report:
close
Email this page's link to a friend or colleague:
close