Featured Research

from universities, journals, and other organizations

Seeing the forest for the trees: Object recognition systems that break images into ever smaller parts should be much more efficient

Date:
May 17, 2010
Source:
Massachusetts Institute of Technology
Summary:
Object recognition is one of the core topics in computer vision research: After all, a computer that can see isn't much use if it has no idea what it's looking at. Researchers have developed new techniques that should make object recognition systems much easier to build and should enable them use computer memory more efficiently.

A new object recognition system developed by researchers at MIT and UCLA looks for rudimentary visual features shared by multiple examples of the same object. Then it looks for combinations of those features shared by multiple examples, and combinations of those combinations, and so on, until it has assembled a model of the object that resembles a line drawing.
Credit: Images courtesy of Long (Leo) Zhu

Object recognition is one of the core topics in computer vision research: After all, a computer that can see isn't much use if it has no idea what it's looking at. Researchers at MIT, working with colleagues at the University of California, Los Angeles, have developed new techniques that should make object recognition systems much easier to build and should enable them use computer memory more efficiently.

A conventional object recognition system, when trying to discern a particular type of object in a digital image, will generally begin by looking for the object's salient features. A system built to recognize faces, for instance, might look for things resembling eyes, noses and mouths and then determine whether they have the right spatial relationships with each other. The design of such systems, however, usually requires human intuition: A programmer decides which parts of the objects are the right ones to key in on. That means that for each new object added to the system's repertoire, the programmer has to start from scratch, determining which of the object's parts are the most important.

It also means that a system designed to recognize millions of different types of objects would become unmanageably large. Each object would have its own, unique set of three or four parts, but the parts would look different from different perspectives, and cataloguing all those perspectives would take an enormous amount of computer memory.

In a paper that they'll present at the Institute of Electrical and Electronics Engineers' Conference on Computer Vision and Pattern Recognition in June, postdoc Long (Leo) Zhu and Professors Bill Freeman and Antonio Torralba, all of MIT's Computer Science and Artificial Intelligence Laboratory, and Yuanhao Chen and Alan Yuille of UCLA describe an approach that solves both of these problems at once. Like most object-recognition systems, their system learns to recognize new objects by being "trained" with digital images of labeled objects. But it doesn't need to know in advance which of the objects' features it should look for. For each labeled object, it first identifies the smallest features it can -- often just short line segments. Then it looks for instances in which these low-level features are connected to each other, forming slightly more sophisticated shapes. Then it looks for instances in which these more sophisticated shapes are connected to each other, and so on, until it's assembled a hierarchical catalogue of increasingly complex parts whose top layer is a model of the whole object.

Economies of scale

Once the system has assembled its catalogue from the bottom up, it goes through it from the top down, winnowing out all the redundancies. In the parts catalogue for a horse seen in profile, for instance, the second layer from the top might include two different representations of the horse's rear: One could include the rump, one rear leg and part of the belly; the other might include the rump and both rear legs. But it could turn out that in the vast majority of cases where the system identifies one of these "parts," it identifies the other as well. So it will simply cut one of them out of its hierarchy.

Even though the hierarchical approach adds new layers of information about digitally depicted objects, it ends up saving memory because different objects can share parts. That is, at several different layers, the parts catalogues for a horse and a deer could end up having shapes in common; to some extent, the same probably holds true for horses and cars. Wherever a shape is shared between two or more catalogues, the system needs to store it only once. In their new paper, the researchers show that, as they add the ability to recognize more objects to their system, the average number of parts per object steadily declines.

Although the researchers' work promises more efficient use of computer memory and programmers' time, "it is far more important than just a better way to do object recognition," says Tai Sing Lee, an associate professor of computer science at Carnegie Mellon University. "This work is important partly because I feel it speaks to a couple scientific mysteries in the brain." Lee points out that visual processing in humans seems to involve five to seven distinct brain regions, but no one is quite sure what they do. The researchers' new object recognition system doesn't specify the number of layers in each hierarchical model; the system simply assembles as many layers as it needs. "What kind of stunned me is that [the] system typically learns five to seven layers," Lee says. That, he says, suggests that it may perform the same types of visual processing that takes place in the brain.

In their paper, the MIT and UCLA researchers report that, in tests, their system performed as well as existing object-recognition systems. But that's still nowhere near as well as the human brain. Lee says that the researchers' system currently focuses chiefly on detecting the edges of two-dimensional depictions of objects; to approach the performance of the human brain, it will have to incorporate a lot of additional information about surface textures and three-dimensional contours, as the brain does. Zhu adds that he and his colleagues are also pursuing other applications of their technology. For instance, their hierarchical models naturally lend themselves not only to automatic object recognition -- determining what an object is -- but also automatic object segmentation -- labeling an object's constituent parts.


Story Source:

The above story is based on materials provided by Massachusetts Institute of Technology. The original article was written by Larry Hardesty, MIT News Office. Note: Materials may be edited for content and length.


Cite This Page:

Massachusetts Institute of Technology. "Seeing the forest for the trees: Object recognition systems that break images into ever smaller parts should be much more efficient." ScienceDaily. ScienceDaily, 17 May 2010. <www.sciencedaily.com/releases/2010/05/100511104633.htm>.
Massachusetts Institute of Technology. (2010, May 17). Seeing the forest for the trees: Object recognition systems that break images into ever smaller parts should be much more efficient. ScienceDaily. Retrieved April 16, 2014 from www.sciencedaily.com/releases/2010/05/100511104633.htm
Massachusetts Institute of Technology. "Seeing the forest for the trees: Object recognition systems that break images into ever smaller parts should be much more efficient." ScienceDaily. www.sciencedaily.com/releases/2010/05/100511104633.htm (accessed April 16, 2014).

Share This



More Computers & Math News

Wednesday, April 16, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Google Patents Contact Lens Cameras; Internet Is Wary

Google Patents Contact Lens Cameras; Internet Is Wary

Newsy (Apr. 15, 2014) — Google has filed for a patent to develop contact lenses capable of taking photos. The company describes possible benefits to blind people. Video provided by Newsy
Powered by NewsLook.com
The Walking, Talking Oil-Drigging Rig

The Walking, Talking Oil-Drigging Rig

Reuters - Business Video Online (Apr. 15, 2014) — Pennsylvania-based Schramm is incorporating modern technology in its next generation oil-drigging rigs, making them smaller, safer and smarter. Ernest Scheyder reports. Video provided by Reuters
Powered by NewsLook.com
NSA Leaks Net Pulitzer For Guardian, Washington Post

NSA Leaks Net Pulitzer For Guardian, Washington Post

Newsy (Apr. 14, 2014) — The Pulitzer Prize for Public Service was awarded to The Washington Post and The Guardian for their work covering the NSA's surveillance programs. Video provided by Newsy
Powered by NewsLook.com
Google Buys Drone Maker, Hopes to Connect Rural World

Google Buys Drone Maker, Hopes to Connect Rural World

Newsy (Apr. 14, 2014) — Formerly courted by Facebook, Titan Aerospace will become a part of Google's quest to blanket the world in Internet connectivity. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins