Unmanned military vehicles could distinguish cave entrances from shadows and locate other hazards if they had a sense of vision similar to humans, say researchers at UCLA’s Henry Samueli School of Engineering and Applied Science.
A Pentagon spokesman was recently quoted concerning the difficulty of spotting caves in Afghanistan, saying, “From a cockpit perspective, a cave looks like nothing more than a shadow on the ground.”
Stefano Soatto, assistant professor at UCLA’s computer science department and head of the engineering school’s vision lab, is studying how the human visual system works in order to pass the ability on to machines. “In practice, the human visual system is still by far the best around, but this may not be so for long,” Soatto said.
Soatto’s research team is examining how people use vision to interact with others and with their environment, and is designing systems that will allow computers to interact in similar ways.
“We use senses to build models of the world around us that allow us to walk through an unfamiliar environment and interact with it,” Soatto said. “I want a machine to be able to do the same thing.”
The projects under way at the UCLA Vision Lab all involve “dynamic vision,” the ability of a computer to take in visual sensory information about its surroundings and use what it “sees” of its changing environment to perform assigned tasks, such as exploring underground bunkers or monitoring bank vaults.
As Soatto explains, “The world has certain physical properties — shape, motion, material properties of objects and so forth. Humans have developed, over the course of evolution, a particular way of representing their environment that has been crucial for them to survive.”
Machines, especially computers, can also be made to interpret the physical world and interact with it, whether that environment is inside a nuclear reactor or on the operating table.
Soatto is talking about much more than simple photography or video. “We know how to build cameras to capture images, we know how to build computers to crunch numbers, and we know how to build robots that move and perform pre-assigned tasks,” Soatto said. “However, we still do not know how to put everything together and endow a machine with a sense of vision.”
For a computer to perform “real-world” tasks, it must do more than simply capture and analyze a photograph. Using only that information, a computer cannot distinguish a photograph of a scene from the scene itself. To interact with a changing environment, the computer needs to gain additional information about spatial properties of the environment — shape, motion, distances, angles — measurable properties you can only get as images change over time. Multiple points of view are needed, where either the scene or the viewer’s perspective changes. Only then can a three-dimensional representation of the world be created.
Consider face-recognition security systems. Sometimes used at banks, airports and even public events, these systems are designed to recognize and allow passage to certain people while denying entry to strangers. But the system can be fooled in ways that the human visual system cannot, Soatto said.
“Current systems capture one image of your face, match it to a database and can recognize it as yours and let you in. However, if an intruder shows up with a photograph of your face, the system would not be able to distinguish that 2-D photograph
Materials provided by University Of California - Los Angeles. Note: Content may be edited for style and length.
Cite This Page: