Featured Research

from universities, journals, and other organizations

Hopkins-Led Team Developing New Ways To Handle Flood Of Data

September 27, 1999
Johns Hopkins University
The fountain of scientific data has become a fire hose and is turning into a raging river. A Johns Hopkins-led consortium is working on ways to handle the information overload faced by scientists.

The fountain of information at the heart of science has become a fire hose, and an increase to river-like volumes is on the way. The CERN particle collider in Geneva, Switzerland, for instance, currently produces more than 1 petabyte, or about 1,000,000,000,000,000 bytes, of information every year. The words and other text in all the books in the Library of Congress, in contrast, add up to only about one-thousandth of that information, or one terabyte (1 trillion bytes). And CERN is just one example of the tremendous information-generating powers of modern science.

Related Articles

"Our current ways of doing science are very much based on the concept that our data sets are so small that we can sort of ‘eyeball' the whole thing and locate the interesting data," says Alexander Szalay, Alumni Centennial Professor of Physics and Astronomy at The Johns Hopkins University. "And with the data sets we are getting in an increasing number of areas of science, this is just not going to be feasible. So we have to do something drastically different."

Szalay leads an interdisciplinary team of researchers developing new ways to store, access and search large volumes of data. Participants in the Hopkins-led collaborative include scientists from Cal Tech, the U.S. Department of Energy's Fermilab and Microsoft Corp. They have been working together for several years already; this month they will receive the first formal support for their efforts in a 3-year, $2.5 million grant from the National Science Foundation.

"This problem is of course much bigger than astronomy or particle physics," Szalay says. "I think this is actually becoming more a problem for the whole society. We are choking on information, and we have to sort out the relevant from the irrelevant. So I think what we're doing is a very interesting test bed for experimenting with new technologies that could have broader applications elsewhere."

Particle physicists were among the first to have to deal with huge quantities of information. Their work to manage that information led to the development of tools and techniques that found uses beyond the realm of the physics lab, notes Aihud Pevsner, Jacob P. Hain Professor of Physics and Astronomy at Johns Hopkins and a member of the collaborative.

"To help work with large data sets at CERN, Tim Berners-Lee invented in 1989 what later became the World Wide Web," says Pevsner. "He did it because the tools that they had at the time were inadequate for the distribution of the data sets they were working with."

Pevsner, a particle physicist, will be one of 500 American physicists working at the Large Hadron Collider (LHC) at CERN, the world's most powerful particle collider. The LHC is expected to produce 100-petabyte data sets.

Szalay is a researcher for the Sloan Digital Sky Survey (SDSS), an effort he calls the "cosmic genome project," which will map everything visible in several large chunks of the northern and southern sky. SDSS starts next year, and before it is over he estimates that it will produce 40 terabytes of data with a 2-terabyte catalog.

Such a high volume of data reduces the chances that astronomers will miss gathering important information, but it also makes it harder to find that information among what's been gathered. "When you have so much data that it chokes you, you have to keep breaking it up into smaller chunks until it no longer chokes you," Szalay says.

Developing better ways to break down large quantities of information is the first major component of research under the NSF grant. The SDSS information, for example, might be broken up both by the area of the sky that the data comes from and by the color of the objects observed in the sky. The challenge, though, is to make sure that this process of partitioning the data improves the scientists' abilities to see important patterns and irregularities in the data.

"We want to try to make it possible for data that will be of interest to the same kinds of queries to be ‘located' close together so they are easier to find," says Ethan Vishniac, director of the Johns Hopkins Center for Astrophysical Sciences, also a collaborative member.

Another concern is that these huge chunks of information will probably be stored at geographically different locations. Some next-generation science projects involve so much information, according to Szalay, that it cannot be brought to researchers across computer networks. Arranging ways to simultaneously access data in these different locations without ever bringing it together in one database, a technique called "distributed processing," is the second major component of research supported by the NSF grant.

The third component of the NSF grant will improve a technique called "parallel" querying. This involves searching in different locations at the same time, not unlike sending out an army of librarians to search or work in several different, large libraries at once. Researchers will strive to make these search agents smarter and more independent by improving the software they use. To test their efforts at dealing with these challenges, researchers will use data from the SDSS, from the CERN Particle Collider and from GALEX, a sky-mapping survey that covers the same areas as SDSS but measures different forms of radiation.

"Data sets that are astronomical in every sense of that word are great test beds for computer scientists to experiment with to develop novel techniques for visualizing, organizing, and querying information," says Michael Goodrich, Hopkins professor of computer science and a member of the collaborative.

Additional collaborators include physicist Harvey Newman, research scientist Julian Bunn and astronomer Chris Martin of Caltech; physicist Thomas Nash of Fermilab; computer scientist Jim Gray of Microsoft; and astronomers Ani Thakar and Peter Kunszt of Hopkins.

The $2.5 million NSF grant is one of 31 announced by NSF as part of a new effort to support "knowledge and distributed intelligence" projects. The grants are focused on efforts to apply new computer technology across multidisciplinary areas in science and engineering.

Story Source:

The above story is based on materials provided by Johns Hopkins University. Note: Materials may be edited for content and length.

Cite This Page:

Johns Hopkins University. "Hopkins-Led Team Developing New Ways To Handle Flood Of Data." ScienceDaily. ScienceDaily, 27 September 1999. <www.sciencedaily.com/releases/1999/09/990924115544.htm>.
Johns Hopkins University. (1999, September 27). Hopkins-Led Team Developing New Ways To Handle Flood Of Data. ScienceDaily. Retrieved April 18, 2015 from www.sciencedaily.com/releases/1999/09/990924115544.htm
Johns Hopkins University. "Hopkins-Led Team Developing New Ways To Handle Flood Of Data." ScienceDaily. www.sciencedaily.com/releases/1999/09/990924115544.htm (accessed April 18, 2015).

Share This

More From ScienceDaily

More Computers & Math News

Saturday, April 18, 2015

Featured Research

from universities, journals, and other organizations

Featured Videos

from AP, Reuters, AFP, and other news services

WikiLeaks Refuses To Let Sony Hack Die, Posts Database

WikiLeaks Refuses To Let Sony Hack Die, Posts Database

Newsy (Apr. 17, 2015) — WikiLeaks&apos; Julian Assange says the hacked emails and documents "belong in the public domain." Video provided by Newsy
Powered by NewsLook.com
Scientists Create Self-Powering Camera

Scientists Create Self-Powering Camera

Reuters - Innovations Video Online (Apr. 17, 2015) — American scientists build a self-powering camera that captures images without using an external power source, allowing it to operate indefinitely in a well-lit environment. Elly Park reports. Video provided by Reuters
Powered by NewsLook.com
The State Of Virtual Reality

The State Of Virtual Reality

Newsy (Apr. 17, 2015) — Virtual Reality is still a young industry. What’s on offer and what should we expect from our immersive new future? Video provided by Newsy
Powered by NewsLook.com
Cybercrime Could Cost $400 Bln

Cybercrime Could Cost $400 Bln

Reuters - Business Video Online (Apr. 16, 2015) — Representatives from around 160 countries gather at the Hague to discuss cyber space and cyber security, including the dilemmas and challenges regarding the evolution of the internet. Ciara Lee reports. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.


Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News


Free Subscriptions

Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile

Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?

Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins