Featured Research

from universities, journals, and other organizations

Getting to the bottom of statistics: Software utilizes data from the Internet for interpreting statistics

Date:
July 16, 2012
Source:
Technische Universität Darmstadt
Summary:
Interpreting the results of statistical surveys, e.g., Transparency Internation­al’s corruption indices, is not always a simple matter. As Dr. Heiko Paulheim of the Knowledge Engineering Group at the TU Darmstadt’s Computer Sciences Dept. put it, “Although methods that will unearth explanations for statistics are available, they are confined to utilizing data contained in the statistics involved. Further, background information will not be taken into account. That is what led us to the idea of applying data-mining methods that we had been studying here to the semantic web in order to obtain further, background infor­ma­tion that will allow us to learn more from statistics.”

Explain-a-LOD helps to interpret statistics, like for example the corruption perceptions index by Transparency International.
Credit: Diagram: Transparency International

Interpreting the results of statistical surveys, e.g., Transparency Internation­al's corruption indices, is not always a simple matter. As Dr. Heiko Paulheim of the Knowledge Engineering Group at the TU Darmstadt's Computer Sciences Dept. put it, "Although methods that will unearth explanations for statistics are available, they are confined to utilizing data contained in the statistics involved. Further, background information will not be taken into account. That is what led us to the idea of applying data-mining methods that we had been studying here to the semantic web in order to obtain further, background infor­ma­tion that will allow us to learn more from statistics."

The "Explain-a-LOD" tool that Paulheim developed accesses linked open data (LOD), i.e., enormous compilations of publicly available, semantically linked data accessible on the Internet, and, from that data, automatically formulates hypo­theses regarding the interpretation of arbitrary types of statistics. To start off, the statistics to be interpreted are read into Explain-a-LOD. Explain-a-LOD then automatically searches the pools of linked open data for associated records and adds them to the initial set. Paulheim explained that, "If, for example, the country "Germany" is listed in the corruption-index data, LOD‑records that contain information on Germany will be identified and further attributes, such as its population, its membership in the EU and OECD, or the total number of companies domiciled there, generated. Attributes that are unlikely to yield useful hypotheses will be automatically deleted in order to reduce the volumes of such enriched statistics.

Once that preprocessing has been concluded, Explain-a-LOD proceeds to the second stage and automatically formulates hypotheses, based on the enriched statistics. The methods employed include simple correlation analyses, as well as other methods for recognizing regularities in statistical data, in order to allow formulation of more-complex hypotheses covering more than just a single attribute. Users will then be presented with the resultant hypotheses, in the form of, e.g., phrases, such as "OECD-member countries have low corruption indices" if any positive correlation exists between the attribute "OECD‑member­ship" and the target attribute, "corruption index," regardless of whether the original statistics contained any references to countries' OECD‑membership, or lack of it. That background knowledge will be automat­ically taken into account by Explain-a-LOD.

Surprising and useful hypotheses

Paulheim and his colleagues have thoroughly tested their approach on various sorts of statistics, including Mercer's standard-of-living study and Trans­parency International's corruption index. Paulheim noted that, "What one obtains are mixtures of obvious and surprising hypotheses, such as "cities where tempera­tures do not exceed 21°C during the month of May have high stan­dards of living," "capital cities generally have lower standards of living than other cities," or "countries that have few schools and few radio stations have high cor­rup­tion indices." An evaluation of the results by test persons verified that impression. Paulheim added that, "The test persons perceived the resultant hypotheses as largely surprising, as well as nontrivial, and, very frequently, as useful." However, the test persons had serious doubts regarding the trustworth­i­ness of the resultant hypotheses, which, Paulheim noted, was also attributable to the unsatisfactory qualities of some of the data contained in the open-data cloud.

Explain-a-LOD has been presented at several international conferences over the past few months. The tool received the "Best In-Use Paper" and "Best Demo" awards at the Extended Semantic Web Conference 2012 held on Crete in late May. Several upgradings of Explain-a-LOD, among them implementation of further attribute-generation algorithms and facilities for accessing further data pools from the LOD‑cloud, are planned for the future.

Further information: http://www.ke.tu-darmstadt.de/resources/explain-a-lod


Story Source:

The above story is based on materials provided by Technische Universität Darmstadt. Note: Materials may be edited for content and length.


Cite This Page:

Technische Universität Darmstadt. "Getting to the bottom of statistics: Software utilizes data from the Internet for interpreting statistics." ScienceDaily. ScienceDaily, 16 July 2012. <www.sciencedaily.com/releases/2012/07/120716091925.htm>.
Technische Universität Darmstadt. (2012, July 16). Getting to the bottom of statistics: Software utilizes data from the Internet for interpreting statistics. ScienceDaily. Retrieved April 20, 2014 from www.sciencedaily.com/releases/2012/07/120716091925.htm
Technische Universität Darmstadt. "Getting to the bottom of statistics: Software utilizes data from the Internet for interpreting statistics." ScienceDaily. www.sciencedaily.com/releases/2012/07/120716091925.htm (accessed April 20, 2014).

Share This



More Computers & Math News

Sunday, April 20, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

Nintendo Changed Gaming World, but Its Future Uncertain: Upstone

Nintendo Changed Gaming World, but Its Future Uncertain: Upstone

AFP (Apr. 19, 2014) — The Nintendo Game Boy celebrates its 25th anniversary Monday and game expert Stephen Upstone says the console can be credited with creating a trend towards handheld gaming devices. Duration: 01:21 Video provided by AFP
Powered by NewsLook.com
Why Did Nike Fire Most Of Its Nike FuelBand Team?

Why Did Nike Fire Most Of Its Nike FuelBand Team?

Newsy (Apr. 19, 2014) — Nike fired most of its Digital Sport hardware team, the group behind Nike's FuelBand device. Could Apple or an overcrowded market be behind layoffs? Video provided by Newsy
Powered by NewsLook.com
Nearly Two Weeks On, The Internet Copes With Heartbleed

Nearly Two Weeks On, The Internet Copes With Heartbleed

Newsy (Apr. 19, 2014) — The Internet is taking important steps in patching the vulnerabilities Heartbleed highlighted, but those preventive measures carry their own costs. Video provided by Newsy
Powered by NewsLook.com
Facebook To Share Nearby Friends Data With Advertisers

Facebook To Share Nearby Friends Data With Advertisers

Newsy (Apr. 19, 2014) — A Facebook spokesperson has confirmed the company will use GPS data from the new Nearby Friends feature for advertising sometime in the future. Video provided by Newsy
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins