Featured Research

from universities, journals, and other organizations

Computer Users Are Digitizing Books Quickly And Accurately With New Method

Date:
August 19, 2008
Source:
Carnegie Mellon University
Summary:
Millions of computer users collectively transcribe the equivalent of 160 books each day with better than 99 percent accuracy, despite the fact that few spend more than a few seconds on the task and that most do not realize they are doing valuable work, Carnegie Mellon University researchers report.

Carnegie Mellon computer scientist Luis von Ahn.
Credit: Image courtesy of Carnegie Mellon University

Millions of computer users collectively transcribe the equivalent of 160 books each day with better than 99 percent accuracy, despite the fact that few spend more than a few seconds on the task and that most do not realize they are doing valuable work, Carnegie Mellon University researchers reported recently in Science Express.

Related Articles


They can work so prodigiously because Carnegie Mellon computer scientists led by Luis von Ahn have taken a widely used Web site security measure, called a CAPTCHA, and given it a second purpose — digitizing books produced prior to the computer age. When Web visitors solve one of the distorted-letter puzzles so they can register for email or post a comment on a blog, they simultaneously help turn the printed word into machine-readable text.

More than a year after implementing their version, called reCAPTCHA, http://recaptcha.net/ on thousands of Web sites worldwide, the researchers conclude that their word deciphering process achieves the industry standard for human transcription services — better than 99 percent accuracy. Their report, published online today, will appear in an upcoming issue of the journal Science.

Furthermore, the amount of work that can be accomplished is herculean. More than 100 million CAPTCHAs are solved every day and, though each puzzle takes only a few seconds to solve, the aggregate amount of time translates into hundreds of thousands of hours of human effort that can potentially be tapped. During the reCAPTCHA system's first year of operation, more than 1.2 billion reCAPTCHAs have been solved and more than 440 million words have been deciphered. That's the equivalent of manually transcribing more than 17,600 books.

"More Web sites are adopting reCAPTCHAs each day, so the rate of transcription keeps growing," said von Ahn, an assistant professor in the School of Computer Science's Computer Science Department. "More than 4 million words are being transcribed every day. It would take more than 1,500 people working 40 hours a week at a rate of 60 words a minute to match our weekly output."

Von Ahn said reCAPTCHAs are being used to digitize books for the Internet Archive and to digitize newspapers for The New York Times. Digitization allows older works to be indexed, searched, reformatted and stored in the same way as today's online texts.

Old texts are typically digitized by photographically scanning pages and then transforming the text using optical character recognition (OCR) software. But when ink has faded and paper has yellowed, OCR sometimes can't recognize some words — as many as one out of every five, according to the Carnegie Mellon team's tests. Without reCAPTCHA, these words must be deciphered manually at great expense.

Conventional CAPTCHAs, which were developed at Carnegie Mellon, involve letters and numbers whose shapes have been distorted or backgrounds altered so that computers can't recognize them, but humans can. To create reCAPTCHAs, the researchers use images of words from old texts that OCR systems have had trouble reading.

Helping to make old books and newspapers more accessible to a computerized world is something that the researchers find rewarding, but is only part of a larger goal. "We are demonstrating that we can take human effort — human processing power — that would otherwise be wasted and redirect it to accomplish tasks that computers cannot yet solve," von Ahn said.

For instance, he and his students have developed online games, available at http://www.gwap.com, that analyze photos and audio recordings — tasks beyond the capability of computers. Similarly, University of Washington biologists recently built Fold It, http://fold.it/, a game in which people compete to determine the ideal structure of a given protein.

In addition to von Ahn, authors of the new report include computer science undergraduate Benjamin Maurer, graduate students Colin McMillen and David Abraham, and Manuel Blum, professor of computer science.


Story Source:

The above story is based on materials provided by Carnegie Mellon University. Note: Materials may be edited for content and length.


Cite This Page:

Carnegie Mellon University. "Computer Users Are Digitizing Books Quickly And Accurately With New Method." ScienceDaily. ScienceDaily, 19 August 2008. <www.sciencedaily.com/releases/2008/08/080814154329.htm>.
Carnegie Mellon University. (2008, August 19). Computer Users Are Digitizing Books Quickly And Accurately With New Method. ScienceDaily. Retrieved October 24, 2014 from www.sciencedaily.com/releases/2008/08/080814154329.htm
Carnegie Mellon University. "Computer Users Are Digitizing Books Quickly And Accurately With New Method." ScienceDaily. www.sciencedaily.com/releases/2008/08/080814154329.htm (accessed October 24, 2014).

Share This



More Computers & Math News

Friday, October 24, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

The Best Apps to Organize Your Life

The Best Apps to Organize Your Life

Buzz60 (Oct. 23, 2014) — Need help organizing your bills, schedules and other things? Ko Im (@konakafe) has the best apps to help you stay on top of it all! Video provided by Buzz60
Powered by NewsLook.com
Nike And Apple Team Up To Create Wearable ... Something

Nike And Apple Team Up To Create Wearable ... Something

Newsy (Oct. 23, 2014) — For those looking for wearable tech that's significantly less nerdy than Google Glass, Nike CEO Mark Parker says don't worry, It's on the way. Video provided by Newsy
Powered by NewsLook.com
Chameleon Camouflage to Give Tanks Cloaking Capabilities

Chameleon Camouflage to Give Tanks Cloaking Capabilities

Reuters - Innovations Video Online (Oct. 22, 2014) — Inspired by the way a chameleon changes its colour to disguise itself; scientists in Poland want to replace traditional camouflage paint with thousands of electrochromic plates that will continuously change colour to blend with its surroundings. The first PL-01 concept tank prototype will be tested within a few years, with scientists predicting that a similar technology could even be woven into the fabric of a soldiers' clothing making them virtually invisible to the naked eye. Matthew Stock reports. Video provided by Reuters
Powered by NewsLook.com
Internet of Things Aims to Smarten Your Life

Internet of Things Aims to Smarten Your Life

AP (Oct. 22, 2014) — As more and more Bluetooth-enabled devices are reaching consumers, developers are busy connecting them together as part of the Internet of Things. (Oct. 22) Video provided by AP
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:

Strange & Offbeat Stories

 

Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins