Featured Research

from universities, journals, and other organizations

Tool to improve Wikipedia accuracy developed

Date:
September 26, 2010
Source:
University of Iowa
Summary:
Check the Microsoft entry on Wikipedia at some point in the past and you might have learned that the company's name is Microshaft, its products are evil and its logo is a kitten. Similarly, you may have learned from Abraham Lincoln's Wikipedia entry that he was married to Brayson Kondracki, his birth date is March 14 and Pete likes PANCAKES.

Check the Microsoft entry on Wikipedia at some point in the past and you might have learned that the company's name is Microshaft, its products are evil and its logo is a kitten.

Similarly, you may have learned from Abraham Lincoln's Wikipedia entry that he was married to Brayson Kondracki, his birth date is March 14 and Pete likes PANCAKES.

None of these are correct and/or relevant, but they all showed up at one time or another in the online encyclopedia's listings. They are also an example of one of the challenges facing Wikipedia -- finding and undoing the malicious editing that introduces facts that are incorrect, misleading, editorializing or just plain bizarre.

But a group of University of Iowa researchers are developing a new tool that can detect potential vandalism and improve the accuracy of Wikipedia entries. The tool is an algorithm that checks new edits to a page and compares them to words in the rest of the entry, then alerts an editor or page manager if something doesn't seem right.

Existing tools do exist that try to weed out potential vandalism and are quite useful in many cases, said Si-Chi Chin, a graduate student in UI's Interdisciplinary Graduate Program in Informatics. Those tools are based on rules and screens that spot obscenities or vulgarities, or major edits, such as deletions of entire sections, or significant edits throughout a document (changing "Microsoft" to "Apple" in the Microsoft entry, for instance).

But those tools are built manually, with prohibited words and phrases entered by hand, so they're time-consuming and easy to evade. They also aren't as good for catching smaller types of vandalism that lead Chin and her professors to develop the automated tool. They recently tested the algorithm by reviewing all the edits made to the Abraham Lincoln and Microsoft entries, Wikipedia's two most vandalized pages, to see how many of the pernicious edits it could find. That meant reviewing more than 4,000 edits in each entry. Some are still on the page, but most have been deleted and archived.

As described in their paper, "Detecting Wikipedia Vandalism with Active Learning and Statistical Language Models," the statistical language model algorithm works by finding words or vocabulary patterns that it can't find elsewhere in the entry at any time since it was first written. For instance, when someone wrote "Pete loves PANCAKES" into Lincoln's section, the algorithm recognized the graffiti as potential vandalism after scanning the rest of the entry.

"It determines the probability of each word appearing, and because the word 'pancakes' didn't turn up anywhere else in the history of Lincoln's entry, the algorithm saw it as something new and possible graffiti," Chin said.

In all, the statistical language model algorithm caught more of the vandalism in some categories than existing tools.

"Experimental results show that our approach can identify both large-scale and small-scale vandalism and is strong in filtering out various types of graffiti and misinformation instances," said Padmini Srinivasan, a professor of computer science and one of Chin's co-researchers. It detected about half of the graffiti in both the Lincoln and Microsoft entries, and about a quarter of the large-scale editing and misinformation types of vandalism. It was less successful in detecting link spam (hyperlinking to irrelevant or non-existent websites) or image attacks (replacing a portrait of Lincoln with a photo of a redwood tree, a change that managed to survive for two years and 4,000 edits). But those are particularly difficult to detect with vocabulary algorithms because the tool can't read images, and web spam can only be found by manually clicking the link.

The algorithm also has the advantage of being able to adapt to catch future forms of vandalism. Co-researcher Nick Street, professor of management sciences in the Tippie College of Business, said it's not unlike a virus detector in that way.

"It learns to recognize changes so it keeps one step ahead of the vandals," he said.

Their paper, co-authored with David Eichmann of the UI Institute of Clinical and Translational Science, was presented recently at the Fourth Workshop on Information Credibility on the Web in Raleigh, N.C.


Story Source:

The above story is based on materials provided by University of Iowa. Note: Materials may be edited for content and length.


Cite This Page:

University of Iowa. "Tool to improve Wikipedia accuracy developed." ScienceDaily. ScienceDaily, 26 September 2010. <www.sciencedaily.com/releases/2010/09/100924212131.htm>.
University of Iowa. (2010, September 26). Tool to improve Wikipedia accuracy developed. ScienceDaily. Retrieved October 23, 2014 from www.sciencedaily.com/releases/2010/09/100924212131.htm
University of Iowa. "Tool to improve Wikipedia accuracy developed." ScienceDaily. www.sciencedaily.com/releases/2010/09/100924212131.htm (accessed October 23, 2014).

Share This



More Computers & Math News

Thursday, October 23, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Chameleon Camouflage to Give Tanks Cloaking Capabilities

Chameleon Camouflage to Give Tanks Cloaking Capabilities

Reuters - Innovations Video Online (Oct. 22, 2014) — Inspired by the way a chameleon changes its colour to disguise itself; scientists in Poland want to replace traditional camouflage paint with thousands of electrochromic plates that will continuously change colour to blend with its surroundings. The first PL-01 concept tank prototype will be tested within a few years, with scientists predicting that a similar technology could even be woven into the fabric of a soldiers' clothing making them virtually invisible to the naked eye. Matthew Stock reports. Video provided by Reuters
Powered by NewsLook.com
Internet of Things Aims to Smarten Your Life

Internet of Things Aims to Smarten Your Life

AP (Oct. 22, 2014) — As more and more Bluetooth-enabled devices are reaching consumers, developers are busy connecting them together as part of the Internet of Things. (Oct. 22) Video provided by AP
Powered by NewsLook.com
Free Math App Is A Teacher's Worst Nightmare

Free Math App Is A Teacher's Worst Nightmare

Newsy (Oct. 22, 2014) — New photo-recognition software from MicroBlink, called PhotoMath, solves linear equations and simple math problems with step-by-step results. Video provided by Newsy
Powered by NewsLook.com
Rate Hike Worries Down on Inflation Data

Rate Hike Worries Down on Inflation Data

Reuters - Business Video Online (Oct. 22, 2014) — Inflation remains well under control according to the latest consumer price index, giving the Federal Reserve more room to keep interest rates low for awhile. Bobbi Rebell reports. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:

Strange & Offbeat Stories

 

Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins