Recognising objects and groups of objects is something we humans take for granted. For computers, this is far from straightforward. A European project has come up with novel solutions to this conundrum.
Imagine your friends have blindfolded you and taken you to a “secret location”. When they take off your blindfold, you immediately see a group of people around you and realise that they have thrown you a surprise birthday party. How did you know? Because everyone shouted “surprise”, and there were balloons, a birthday cake and booze.
The question may seem like a silly one, but the processes involved are far from straightforward. In fact, you had to collate an awful lot of visual, as well as other sensory data, cross-reference it with your memories, and make mental deductions.
“Vision is our most important sense and about half of the human brain is involved with vision in one way or another,” explains Luc Van Gool of Belgium’s Leuven University (KUL) who also leads the Computer Vision Laboratory at the Swiss Federal Institute of Technology (ETH). “Enabling us to recognise the objects and places around us is a task it performs brilliantly.”
In fact, what we regard as the simple process of “recognition” would leave many computers stumped. Even something as apparently simple as recognising a birthday cake would normally require computers to be fed with information on what a cake generally looks like, the various shapes and sizes it comes in, the different forms and numbers of candles and other decorations you are likely to find adorning it, etc.
“The same object will look different depending on the viewpoint, the illumination, or the occlusions caused by other objects in front,” notes Van Gool.
Points of view
In brief, computers might be able to calculate pi to hundreds of decimal points and model complex weather patterns, but they may find it impossible, without complex and painstaking programming, to recognise a human whose grown their hair or realise that Chihuahuas and Dobermans belong to the same species.
Van Gool is involved in a project, Cognitive-Level Annotation Using Latent Statistical Structure (CLASS -- http://class.inrialpes.fr/), which is developing technologies to recognise visually specific objects, such as your car, or classes of object, such as a random car on the street.
“The recognition of an object as belonging to a particular group is a harder problem for a computer than the recognition of a specific object. The reason is that object classes show large variability among their members,” Van Gool points out.
The 3.5-year, EU-funded project managed to achieve technological improvements compared with previous efforts. It developed a system in which the description of the objects is based on the appearance of many separate, small patches. Such localised features give the necessary robustness to deal with the massive variations mentioned earlier. In addition, CLASS created special mechanisms – known as efficient approximate neighbourhood searches – for the comparison of an image or an object with huge numbers of reference images.
A picture speaks a thousand words
The specific object recognition technology developed by CLASS has already found a commercial application. Through a company known as kooaba, CLASS technology enables mobile phone subscribers who install the relevant software to take a photo with their handset of, say, a monument, a film poster, or an album cover and get relevant online information about it.
“It’s like the object itself becomes the link to further information,” observes Van Gool. He expects the application of this technology to expand rapidly. For instance, cities and museums may offer interactive guided tours or guide books through kooaba.
Cite This Page: