Featured Research

from universities, journals, and other organizations

Case Researchers Discover Methods To Find 'Needles In Haystack' In Data

Date:
December 6, 2005
Source:
Case Western Reserve University
Summary:
Case Western Reserve University researchers have recently created statistical techniques that improve the chances of detecting a signal in large data sets. The new techniques can not only search for the "needle in the haystack" in particle physics, but have applications in discovering a new galaxy, monitoring transactions for fraud and security risk, identifying the carrier of a virulent disease among millions of people or detecting cancerous tissue in a mammogram.

Simulation and Visualization Credit: Ramani S. Pilla, Catherine Loader and Cyrus C. Taylor

Case faculty members Ramani Pilla and Catherine Loader from statistics and Cyrus Taylor from physics report their findings in the article, "A New Technique for Finding Needles in Haystacks: A Geometric Approach to Distinguishing between a New Source and Random Fluctuations," December 2, in the journal, Physical Review Letters.

"As haystacks of information grow ever larger--and the needles ever smaller--the search for a signal becomes increasingly difficult to find using traditional approaches. There is a need for sophisticated new statistical methods," the researchers report.

Researchers working with large amounts of data encounter the fundamental problem of determining a real signal from random variation in the data. In many practical problems, a suspected signal may only be a small blip in a noisy experimental background.

The Case team discovered a technique that is built on the principle of comparing a set of summary characteristics for any sub region of the observations with the background variation. From these characteristics, attempts are made to find small regions that appear significantly different from the background--a difference that cannot simply be attributed to random chance.

"Methods used in high-energy particle physics problems traditionally have searched for any departure from a background model; that is, anything that is not a haystack," said Pilla, the project leader. "Our method efficiently incorporates information about the type of disorder expected, thereby enabling us to find the signal of interest more accurately."

At the core of the breakthrough is the idea of posing the problem in terms of a "hypothesis-based testing" paradigm to detect statistical disorder in the data. The method further exploits the flexibility behind a long-established geometric formula in creating a technique that significantly enhances the ability to distinguish a signal.

The researchers said the challenge is two-fold: defining efficient test statistics, and determining the critical cut-off. That is, to help the scientist find what is random variation as opposed to what is the signal. The detection problem involves a large number of comparisons, and the researchers caution that experimentalists should not be fooled into false discoveries by random variation.

"The experimenter wants to control the experiment-wise error rate: if there is nothing in the data, then there must be minimal probability of falsely discovering a signal. On the other hand, we want to maximize our chance of discovering any real signal that may be present in the massive data set," said Loader.

"The probabilistic problem associated with this scenario is reduced to one of finding the areas of certain regions on the surface of high-dimensional spheres," explains Pilla.

The Case researchers then exploit the geometric methods pioneered in 1939 by Harold Hotelling and Hermann Weyl. They tested the statistical techniques by using computer simulated particle physics experiments that mimic the real experiments conducted in colliders to demonstrate that the new technique significantly increased detection probabilities.

"In high-energy particle physics and astrophysics problems, chi-square goodness-of-fit tests are widely employed, although they have relatively low power to detect the signal," notes Taylor. "Through my collaborative work with Professors Pilla and Loader, we will be able to develop powerful statistical tests for detecting a signal from noisy data with high probability, a fundamental problem encountered in many scientific disciplines."

Taylor added that "conducting experiments in a particle collider may cost tens of millions of dollars. Improving efficiency in the analysis of experimental results can lead to enormous cost savings. Furthermore, we can obtain the same results with much smaller experiments, or effectively find much smaller departures from the background model."

"Detecting a real signal (the needle) present in random and chaotic data (the haystack) will lead to scientific success," conclude the researchers.

###

Funding for this research received support from the National Science Foundation and the Office of Naval Research. For information, visit http://stat.case.edu/~pillar/PRL/PRL.htm.

Case Western Reserve University is among the nation's leading research institutions. Founded in 1826 and shaped by the unique merger of the Case Institute of Technology and Western Reserve University, Case is distinguished by its strengths in education, research, service, and experiential learning. Located in Cleveland, Case offers nationally recognized programs in the Arts and Sciences, Dental Medicine, Engineering, Law, Management, Medicine, Nursing, and Social Sciences. http://www.case.edu.


Story Source:

The above story is based on materials provided by Case Western Reserve University. Note: Materials may be edited for content and length.


Cite This Page:

Case Western Reserve University. "Case Researchers Discover Methods To Find 'Needles In Haystack' In Data." ScienceDaily. ScienceDaily, 6 December 2005. <www.sciencedaily.com/releases/2005/12/051205161956.htm>.
Case Western Reserve University. (2005, December 6). Case Researchers Discover Methods To Find 'Needles In Haystack' In Data. ScienceDaily. Retrieved October 21, 2014 from www.sciencedaily.com/releases/2005/12/051205161956.htm
Case Western Reserve University. "Case Researchers Discover Methods To Find 'Needles In Haystack' In Data." ScienceDaily. www.sciencedaily.com/releases/2005/12/051205161956.htm (accessed October 21, 2014).

Share This



More Computers & Math News

Tuesday, October 21, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Thanks, Marty McFly! Hoverboards Could Be Coming In 2015

Thanks, Marty McFly! Hoverboards Could Be Coming In 2015

Newsy (Oct. 21, 2014) If you've ever watched "Back to the Future Part II" and wanted to get your hands on a hoverboard, well, you might soon be in luck. Video provided by Newsy
Powered by NewsLook.com
Robots to Fly Planes Where Humans Can't

Robots to Fly Planes Where Humans Can't

Reuters - Innovations Video Online (Oct. 21, 2014) Researchers in South Korea are developing a robotic pilot that could potentially replace humans in the cockpit. Unlike drones and autopilot programs which are configured for specific aircraft, the robots' humanoid design will allow it to fly any type of plane with no additional sensors. Ben Gruber reports. Video provided by Reuters
Powered by NewsLook.com
Japanese Scientists Unveil Floating 3D Projection

Japanese Scientists Unveil Floating 3D Projection

Reuters - Innovations Video Online (Oct. 20, 2014) Scientists in Tokyo have demonstrated what they say is the world's first 3D projection that floats in mid air. A laser that fires a pulse up to a thousand times a second superheats molecules in the air, creating a spark which can be guided to certain points in the air to shape what the human eye perceives as an image. Matthew Stock reports. Video provided by Reuters
Powered by NewsLook.com
Apple Enters Mobile Payment Business

Apple Enters Mobile Payment Business

AP (Oct. 20, 2014) Apple is making a strategic bet with the launch of Apple Pay, the mobile pay service aimed at turning your iPhone into your wallet. (Oct. 20) Video provided by AP
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins