Featured Research

from universities, journals, and other organizations

Collecting just the right data: Algorithm helps identify which data to target

Date:
July 25, 2014
Source:
Massachusetts Institute of Technology
Summary:
Much artificial-intelligence research addresses the problem of making predictions based on large data sets. An obvious example is the recommendation engines at retail sites like Amazon and Netflix. But some types of data are harder to collect -- information about geological formations thousands of feet underground, for instance. And in other applications -- such as trying to predict the path of a storm -- there may just not be enough time to crunch all the available data. When you can't collect all the data you need, a new algorithm tells you which to target.

Calculating the mutual information between two nodes in a graph is like injecting blue dye into one of them and measuring the concentration of blue at the other. Crucial to the new algorithm are the elimination of loops in the graph (orange) and a technique that prevents intermediary nodes (black) from distorting the long-range calculation of mutual information (blue).
Credit: Illustration: Jose-Luis Olivares/MIT (based on images courtesy of the researchers)

Much artificial-intelligence research addresses the problem of making predictions based on large data sets. An obvious example is the recommendation engines at retail sites like Amazon and Netflix.

Related Articles


But some types of data are harder to collect than online click histories -- information about geological formations thousands of feet underground, for instance. And in other applications -- such as trying to predict the path of a storm -- there may just not be enough time to crunch all the available data.

Dan Levine, an MIT graduate student in aeronautics and astronautics, and his advisor, Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics, have developed a new technique that could help with both problems. For a range of common applications in which data is either difficult to collect or too time-consuming to process, the technique can identify the subset of data items that will yield the most reliable predictions. So geologists trying to assess the extent of underground petroleum deposits, or meteorologists trying to forecast the weather, can make do with just a few, targeted measurements, saving time and money.

Levine and How, who presented their work at the Uncertainty in Artificial Intelligence conference this week, consider the special case in which something about the relationships between data items is known in advance. Weather prediction provides an intuitive example: Measurements of temperature, pressure, and wind velocity at one location tend to be good indicators of measurements at adjacent locations, or of measurements at the same location a short time later, but the correlation grows weaker the farther out you move either geographically or chronologically.

Graphic Content

Such correlations can be represented by something called a probabilistic graphical model. In this context, a graph is a mathematical abstraction consisting of nodes -- typically depicted as circles -- and edges -- typically depicted as line segments connecting nodes. A network diagram is one example of a graph; a family tree is another. In a probabilistic graphical model, the nodes represent variables, and the edges represent the strength of the correlations between them.

Levine and How developed an algorithm that can efficiently calculate just how much information any node in the graph gives you about any other -- what in information theory is called "mutual information." As Levine explains, one of the obstacles to performing that calculation efficiently is the presence of "loops" in the graph, or nodes that are connected by more than one path.

Calculating mutual information between nodes, Levine says, is kind of like injecting blue dye into one of them and then measuring the concentration of blue at the other. "It's typically going to fall off as we go further out in the graph," Levine says. "If there's a unique path between them, then we can compute it pretty easily, because we know what path the blue dye will take. But if there are loops in the graph, then it's harder for us to compute how blue other nodes are because there are many different paths."

So the first step in the researchers' technique is to calculate "spanning trees" for the graph. A tree is just a graph with no loops: In a family tree, for instance, a loop might mean that someone was both parent and sibling to the same person. A spanning tree is a tree that touches all of a graph's nodes but dispenses with the edges that create loops.

Betting the Spread

Most of the nodes that remain in the graph, however, are "nuisances," meaning that they don't contain much useful information about the node of interest. The key to Levine and How's technique is a way to use those nodes to navigate the graph without letting their short-range influence distort the long-range calculation of mutual information.

That's possible, Levine explains, because the probabilities represented by the graph are Gaussian, meaning that they follow the bell curve familiar as the model of, for instance, the dispersion of characteristics in a population. A Gaussian distribution is exhaustively characterized by just two measurements: the average value -- say, the average height in a population -- and the variance -- the rate at which the bell spreads out.

"The uncertainty in the problem is really a function of the spread of the distribution," Levine says. "It doesn't really depend on where the distribution is centered in space." As a consequence, it's often possible to calculate variance across a probabilistic graphical model without relying on the specific values of the nodes. "The usefulness of data can be assessed before the data itself becomes available," Levine says.


Story Source:

The above story is based on materials provided by Massachusetts Institute of Technology. The original article was written by Larry Hardesty. Note: Materials may be edited for content and length.


Cite This Page:

Massachusetts Institute of Technology. "Collecting just the right data: Algorithm helps identify which data to target." ScienceDaily. ScienceDaily, 25 July 2014. <www.sciencedaily.com/releases/2014/07/140725110811.htm>.
Massachusetts Institute of Technology. (2014, July 25). Collecting just the right data: Algorithm helps identify which data to target. ScienceDaily. Retrieved December 21, 2014 from www.sciencedaily.com/releases/2014/07/140725110811.htm
Massachusetts Institute of Technology. "Collecting just the right data: Algorithm helps identify which data to target." ScienceDaily. www.sciencedaily.com/releases/2014/07/140725110811.htm (accessed December 21, 2014).

Share This


More From ScienceDaily



More Computers & Math News

Sunday, December 21, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Building Google Into Cars

Building Google Into Cars

Reuters - Business Video Online (Dec. 19, 2014) Google's next Android version could become the standard that'll power your vehicle's entertainment and navigation features, Reuters has learned. Fred Katayama reports. Video provided by Reuters
Powered by NewsLook.com
After Sony Hack, What's Next?

After Sony Hack, What's Next?

Reuters - US Online Video (Dec. 19, 2014) The hacking attack on Sony Pictures has U.S. government officials weighing their response to the cyber-attack. Linda So reports. Video provided by Reuters
Powered by NewsLook.com
Navy Unveils Robot Fish

Navy Unveils Robot Fish

Reuters - Light News Video Online (Dec. 18, 2014) The U.S. Navy unveils an underwater device that mimics the movement of a fish. Tara Cleary reports. Video provided by Reuters
Powered by NewsLook.com
How 2014 Shaped The Future Of The Internet

How 2014 Shaped The Future Of The Internet

Newsy (Dec. 18, 2014) It has been a long, busy year for Net Neutrality. The stage is set for an expected landmark FCC decision sometime in 2015. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins