Featured Research

from universities, journals, and other organizations

Hopkins-Led Team Developing New Ways To Handle Flood Of Data

September 27, 1999
Johns Hopkins University
The fountain of scientific data has become a fire hose and is turning into a raging river. A Johns Hopkins-led consortium is working on ways to handle the information overload faced by scientists.

The fountain of information at the heart of science has become a fire hose, and an increase to river-like volumes is on the way. The CERN particle collider in Geneva, Switzerland, for instance, currently produces more than 1 petabyte, or about 1,000,000,000,000,000 bytes, of information every year. The words and other text in all the books in the Library of Congress, in contrast, add up to only about one-thousandth of that information, or one terabyte (1 trillion bytes). And CERN is just one example of the tremendous information-generating powers of modern science.

Related Articles

"Our current ways of doing science are very much based on the concept that our data sets are so small that we can sort of ‘eyeball' the whole thing and locate the interesting data," says Alexander Szalay, Alumni Centennial Professor of Physics and Astronomy at The Johns Hopkins University. "And with the data sets we are getting in an increasing number of areas of science, this is just not going to be feasible. So we have to do something drastically different."

Szalay leads an interdisciplinary team of researchers developing new ways to store, access and search large volumes of data. Participants in the Hopkins-led collaborative include scientists from Cal Tech, the U.S. Department of Energy's Fermilab and Microsoft Corp. They have been working together for several years already; this month they will receive the first formal support for their efforts in a 3-year, $2.5 million grant from the National Science Foundation.

"This problem is of course much bigger than astronomy or particle physics," Szalay says. "I think this is actually becoming more a problem for the whole society. We are choking on information, and we have to sort out the relevant from the irrelevant. So I think what we're doing is a very interesting test bed for experimenting with new technologies that could have broader applications elsewhere."

Particle physicists were among the first to have to deal with huge quantities of information. Their work to manage that information led to the development of tools and techniques that found uses beyond the realm of the physics lab, notes Aihud Pevsner, Jacob P. Hain Professor of Physics and Astronomy at Johns Hopkins and a member of the collaborative.

"To help work with large data sets at CERN, Tim Berners-Lee invented in 1989 what later became the World Wide Web," says Pevsner. "He did it because the tools that they had at the time were inadequate for the distribution of the data sets they were working with."

Pevsner, a particle physicist, will be one of 500 American physicists working at the Large Hadron Collider (LHC) at CERN, the world's most powerful particle collider. The LHC is expected to produce 100-petabyte data sets.

Szalay is a researcher for the Sloan Digital Sky Survey (SDSS), an effort he calls the "cosmic genome project," which will map everything visible in several large chunks of the northern and southern sky. SDSS starts next year, and before it is over he estimates that it will produce 40 terabytes of data with a 2-terabyte catalog.

Such a high volume of data reduces the chances that astronomers will miss gathering important information, but it also makes it harder to find that information among what's been gathered. "When you have so much data that it chokes you, you have to keep breaking it up into smaller chunks until it no longer chokes you," Szalay says.

Developing better ways to break down large quantities of information is the first major component of research under the NSF grant. The SDSS information, for example, might be broken up both by the area of the sky that the data comes from and by the color of the objects observed in the sky. The challenge, though, is to make sure that this process of partitioning the data improves the scientists' abilities to see important patterns and irregularities in the data.

"We want to try to make it possible for data that will be of interest to the same kinds of queries to be ‘located' close together so they are easier to find," says Ethan Vishniac, director of the Johns Hopkins Center for Astrophysical Sciences, also a collaborative member.

Another concern is that these huge chunks of information will probably be stored at geographically different locations. Some next-generation science projects involve so much information, according to Szalay, that it cannot be brought to researchers across computer networks. Arranging ways to simultaneously access data in these different locations without ever bringing it together in one database, a technique called "distributed processing," is the second major component of research supported by the NSF grant.

The third component of the NSF grant will improve a technique called "parallel" querying. This involves searching in different locations at the same time, not unlike sending out an army of librarians to search or work in several different, large libraries at once. Researchers will strive to make these search agents smarter and more independent by improving the software they use. To test their efforts at dealing with these challenges, researchers will use data from the SDSS, from the CERN Particle Collider and from GALEX, a sky-mapping survey that covers the same areas as SDSS but measures different forms of radiation.

"Data sets that are astronomical in every sense of that word are great test beds for computer scientists to experiment with to develop novel techniques for visualizing, organizing, and querying information," says Michael Goodrich, Hopkins professor of computer science and a member of the collaborative.

Additional collaborators include physicist Harvey Newman, research scientist Julian Bunn and astronomer Chris Martin of Caltech; physicist Thomas Nash of Fermilab; computer scientist Jim Gray of Microsoft; and astronomers Ani Thakar and Peter Kunszt of Hopkins.

The $2.5 million NSF grant is one of 31 announced by NSF as part of a new effort to support "knowledge and distributed intelligence" projects. The grants are focused on efforts to apply new computer technology across multidisciplinary areas in science and engineering.

Story Source:

The above story is based on materials provided by Johns Hopkins University. Note: Materials may be edited for content and length.

Cite This Page:

Johns Hopkins University. "Hopkins-Led Team Developing New Ways To Handle Flood Of Data." ScienceDaily. ScienceDaily, 27 September 1999. <www.sciencedaily.com/releases/1999/09/990924115544.htm>.
Johns Hopkins University. (1999, September 27). Hopkins-Led Team Developing New Ways To Handle Flood Of Data. ScienceDaily. Retrieved January 26, 2015 from www.sciencedaily.com/releases/1999/09/990924115544.htm
Johns Hopkins University. "Hopkins-Led Team Developing New Ways To Handle Flood Of Data." ScienceDaily. www.sciencedaily.com/releases/1999/09/990924115544.htm (accessed January 26, 2015).

Share This

More From ScienceDaily

More Computers & Math News

Monday, January 26, 2015

Featured Research

from universities, journals, and other organizations

Featured Videos

from AP, Reuters, AFP, and other news services

Cablevision Enters Wi-Fi Phone Fray

Cablevision Enters Wi-Fi Phone Fray

Reuters - Business Video Online (Jan. 26, 2015) — The entry by Cablevision and Google could intensify the already heated price wars for mobile phone service. Fred Katayama reports. Video provided by Reuters
Powered by NewsLook.com
Hector the Robot Mimics a Giant Stick Insect

Hector the Robot Mimics a Giant Stick Insect

Reuters - Innovations Video Online (Jan. 26, 2015) — A robot based on a stick insect can navigate difficult terrain autonomously and adapt to its surroundings. Tara Cleary reports. Video provided by Reuters
Powered by NewsLook.com
Scientists Model Flying, Walking Drone After Vampire Bats

Scientists Model Flying, Walking Drone After Vampire Bats

Buzz60 (Jan. 26, 2015) — Swiss scientists build a new drone that can both fly and walk, modeling it after the movements of common vampire bats. Jen Markham (@jenmarkham) has the story. Video provided by Buzz60
Powered by NewsLook.com
Malaysia Airlines Hack: Lizard Squad, ISIS Involved?

Malaysia Airlines Hack: Lizard Squad, ISIS Involved?

Newsy (Jan. 26, 2015) — Malaysia Airlines on Sunday experienced website outages and what appeared to be an attack by hacker group Lizard Squad. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.


Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News


Free Subscriptions

Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile

Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?

Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins