Computer scientists at Washington University in St. Louis have patented two major inventions that should make Internet applications like e-mail, the World Wide Web and electronic commerce 10 times faster than they are now. George Varghese, Ph.D., associate professor of computer science in the School of Engineering and Applied Science, has collaborated with three Washington University colleagues in developing processes that enable the "lookup" for an Internet address to be done at the unfathomable speed of 100 nanoseconds, compared with the current average, a none-too-shabby 1.2 microseconds. This 10-fold increase in speed promises to give routers a faster throughput by making it easier to find the destination addresses for their messages. Varghese presented a paper on the techniques at a fall networking conference of the Association for Computing Machinery.
The processes are mathematical techniques that sort the 32-bit prefix address possibilities of Internet messages into groups that can be searched far faster than the painstaking bit-by-bit process of the past 20 years. If the difference between nanoseconds and microseconds seems trivial, consider that: the number of computers on the Internet is tripling every two years; the applications are increasingly more complex than e-mail and ASCI text files, now involving multimedia, audio and video as well as print data; and the speed of links (the lines that send data packets) is increasing 12 times over its present rate of 45 megabits (a megabit is one million bits) per second, into the gigabit (1,000 million bits per second) range.
While links are faster, routers -- hardware that route Internet messages, much like automated processors in the U.S. Postal Service -- have neither the speed nor the memory to keep up with the data. Today's fastest routers forward messages at a maximum rate of 100,000 to 500,000 messages a second. To keep up with communication link speeds in the gigabit range, a router has to forward five million messages (of an average size of about 1,000 bits) per second. If routers don't get up to speed, bottlenecks, delays and unhappy Internet customers are right around the corner.
"There is a shoot-out going on in the Wild West of Internet Country, where established network vendors and a flurry of start-ups are all vying to provide the fastest Internet message forwarding rates," says Varghese. "Washington University is among the Wild Bill Hickoks and Wyatt Earps who've entered the shooting match. There is strong commercial interest in the techniques, but we'll have to wait until the smoke clears to see how we do."
Think of a router as a post office that gets your e-mail message with a self-contained Internet (IP) destination address. The post office needs to look up the address in a forwarding table that describes which of its many output links the message must be routed through. Once this is determined, the message is sent on that link.
Internet address lookup would be simple if an IP destination address could be found in a table that lists the output link for each Internet address in the world. In this case, the lookup could be done similar to a dictionary search. The only catch is that a router would have to keep millions of entries in its database corresponding to the millions of computers on the Internet. To reduce database size, a router database consists of a smaller set of prefixes. This reduces database size, but at the cost of requiring a more complex lookup, called longest matching prefix.
Varghese offers a metaphor to explain how prefixes work. Consider a flight database in London that could list flights to a thousand U.S. cities. Suppose most flights hub through Boston, except those to California, which hub through Los Angeles. The flight database can be reduced from 1,000 entries to just two prefix entries: These are USA* to Boston; USA.CA* to L.A. (the asterisk denotes a wildcard of information that follows). There is a quirk: a destination city such as USA.CA. Fresno now matches both prefixes. In that case it must go to the longest match (USA.CA*).
The Internet address lookup problem is similar. To understand it, forget your individual quirky e-mail address of numbers and letters. The Internet works in the binary world where everything is a 1 or zero, a combination of both up to 32 bits, which is the destination address placed in every Internet message. "The Internet address quandary is the problem of finding the longest matching prefix," Varghese says.
A destination address whose first six bits are 100100 would match both prefix P1, 10*, and P2, 1001*. But, because P2 is the longer match, messages for this address should be sent to P2, just as USA.CA* was preferred over USA* in Varghese's example.
Tackling a 20-year problem
Today's routers in the Internet backbone have about 40,000 prefix entries instead of millions of possible Internet addresses. A simple lookup solution would compare the address in each packet to each of these 40,000 prefixes to find the longest match, but that would be much too slow. Presently, there are two ways of finding prefix matches. One breaks up the large data base into sub-data bases of prefixes with the same length. The other, called a trie (tree), is similar to a branching family tree. It is a system of nodes, each one having a table of mathematical pointers that route the addresses to the proper link. Both methods essentially search for a longest match among all possible prefix lengths. Because there are 32 possible prefix lengths, both schemes are too slow to work with the next generation of routers.
The Washington University researchers have added two new techniques to tackle this 20-year-old problem. The first method, developed by Varghese and his graduate student, Venkatachary Srinivasan, recognizes that the existing schemes will work faster if they can search through databases that have a smaller number of distinct prefix lengths. Using a technique called controlled prefix expansion, their method takes an existing database with 32 possible prefix lengths and transforms it into a new one with a much smaller number of prefix lengths. The existing schemes then can be run faster on the new database.
"The trie, for instance, now can be worked in multiple bits at a time instead of walking through the trie one bit at a time," Varghese says. "For example, if we walk the trie in larger strides of eight bits, we need only four strides to cover 32 bits instead of 32 strides if we go one bit at a time. This translates into a factor of eight improvement in performance. It's basically a simple trick to make routers more successful. We are putting all our eggs in fewer baskets."
The second method, developed with Jonathan S. Turner, Ph.D., Henry Edwin Sever Professor of Engineering, and visiting graduate student Marcel Walgvogel, uses a completely different approach. Instead of searching for prefixes one length at a time, 32 different ways, the second method divides the problem in half at different stages in the search and, with the aid of pre-computation and markers, provides a quicker search, especially for longer prefix lengths. On the horizon, Internet routing protocols soon will have to look up 128-bit addresses, as the Internet prepares to retrofit to serve a global community of users and user devices.
"In the future, your toaster or air conditioner could have an Internet address, which really exacerbates the problem," Varghese notes, explaining that such household or office items can be controlled and programmed through the Internet. Eight companies have signed nondisclosure agreements with Varghese and Washington University to examine the Washington University techniques. Licensing seems an imminent outcome.
"Some vendors like our second technique because it scales well to the next generation 128-bit addresses, using only seven questions to determine an answer," he says. "Other vendors like our multibit trie scheme for the existing Internet because it is simple and fast. Either way, we're glad we can be a factor in speeding up the Internet."
Materials provided by Washington University In St. Louis. Note: Content may be edited for style and length.
Cite This Page: