Featured Research

from universities, journals, and other organizations

Scientists Explain And Improve Upon 'Enigmatic' Probability Formula

Date:
October 21, 2003
Source:
University Of California, San Diego
Summary:
Scientists at the University of California, San Diego (UCSD) have developed new insight into a formula that helped British cryptanalysts crack the German Enigma code in World War II.

Scientists at the University of California, San Diego (UCSD) have developed new insight into a formula that helped British cryptanalysts crack the German Enigma code in World War II. Writing in the Oct. 17 edition of the journal Science, UCSD Jacobs School of Engineering professor Alon Orlitsky and graduate students Narayana P. Santhanam and Junan Zhang shed light on a lingering mathematical mystery and propose a new solution that could help improve automatic speech recognition, natural language processing, and other machine learning software.

Related Articles


In the article, Orlitsky and his colleagues unlock some of the secrets of the "Good-Turing estimator," a formula for estimating the probability of elements based on observed data. The formula is named after famed mathematicians I.J. Good and Alan Turing who, during WWII, were among a group of cryptanalysts charged with breaking the Enigma cipher -- the code used to encrypt German military communications. Working at Bletchley Park outside of London, their work has been credited by some with shortening the war by several years. (It also led to the development of the first modern computer, and was documented in a number of books and movies.)

The cryptanalysts were greatly aided by their possession of the Kengruppenbuch, the German cipher book that contained all possible secret keys to Enigma, and had been previously captured by British Intelligence. They documented the keys used by various U-boat commanders in previously decrypted messages and used this information to estimate the distributions of pages from which commanders picked their secret keys.

The prevailing technique at the time estimated the likelihood of each page by simply using its empirical frequency, the fraction of the time it had been picked in the past. But Good and Turing developed an unintuitive formula that bore little resemblance to conventional estimators. Surprisingly, this Good-Turing estimator outperformed the more intuitive approaches. Following the war, Good published the formula, mentioning that Turing had an "intuitive demonstration" for its power, but not describing what that demonstration entailed.

Since then, Good-Turing has been incorporated into a variety of applications such as information retrieval, spell-checking, and speech recognition software, where it is used to learn automatically the underlying structure of the language. But despite its usefulness, "its performance has remained something of an enigma itself," said Orlitsky, a professor in the Electrical and Computer Engineering department. While some partial explanations were given as to why Good-Turing may work well, no objective evaluation or results have been established for its optimality. Additionally, scientists observed that while it worked well under many circumstances, at times, its performance was lacking.

Now, Orlitsky, Santhanam, and Zhang believe they have unraveled some of the mystery surrounding Good-Turing, and constructed a new estimator that, unlike the historic formula, is reliable under all conditions. Motivated by information-theoretic and machine-learning considerations, they propose a natural measure for the performance of an estimator. Called attenuation, it evaluates the highest possible ratio between the probability assigned to each symbol in a sequence by any distribution, and the corresponding probability assigned by the estimator.

The UCSD researchers show that intuitive estimators, such as empirical frequency, can attenuate the probability of a symbol by an arbitrary amount. They also prove that Good-Turing performs well in general. While it can attenuate the probability of symbols by a factor of 1.39, it never attenuates by a factor of more than 2. Motivated by these observations, they derived an estimator whose attenuation is 1. This means that as the length of any sequence increases, the probability assigned to each symbol by the new estimator is as high as that assigned to it by any distribution.

"While there is a considerable amount of work to be done in simplifying and further improving the new estimator," concluded Orlitsky, "we hope that this new framework will eventually improve language modeling and hence lead to better speech recognition and data mining software."


Story Source:

The above story is based on materials provided by University Of California, San Diego. Note: Materials may be edited for content and length.


Cite This Page:

University Of California, San Diego. "Scientists Explain And Improve Upon 'Enigmatic' Probability Formula." ScienceDaily. ScienceDaily, 21 October 2003. <www.sciencedaily.com/releases/2003/10/031020055436.htm>.
University Of California, San Diego. (2003, October 21). Scientists Explain And Improve Upon 'Enigmatic' Probability Formula. ScienceDaily. Retrieved November 24, 2014 from www.sciencedaily.com/releases/2003/10/031020055436.htm
University Of California, San Diego. "Scientists Explain And Improve Upon 'Enigmatic' Probability Formula." ScienceDaily. www.sciencedaily.com/releases/2003/10/031020055436.htm (accessed November 24, 2014).

Share This


More From ScienceDaily



More Computers & Math News

Monday, November 24, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Microsoft Adds Robot Guards, Ushers In Sci-Fi Apocalypse

Microsoft Adds Robot Guards, Ushers In Sci-Fi Apocalypse

Newsy (Nov. 23, 2014) Microsoft has robotic security guards working at its Silicon Valley Campus. Video provided by Newsy
Powered by NewsLook.com
European Parliament Might Call For Google's Break-Up

European Parliament Might Call For Google's Break-Up

Newsy (Nov. 22, 2014) This is the latest development in an antitrust investigation accusing Google of unfairly prioritizing own products and services in search results. Video provided by Newsy
Powered by NewsLook.com
Google Announces Improvements To Balloon-Borne Wi-Fi Project

Google Announces Improvements To Balloon-Borne Wi-Fi Project

Newsy (Nov. 21, 2014) In a blog post, Google said its balloons have traveled 3 million kilometers since the start of Project Loon. Video provided by Newsy
Powered by NewsLook.com
Is Nintendo Making A Comeback With 'Super Smash Bros.'?

Is Nintendo Making A Comeback With 'Super Smash Bros.'?

Newsy (Nov. 21, 2014) Nintendo released new "Super Smash Bros." Friday, and it's getting great reviews. Could this mean a comeback for the gaming company? Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins