Featured Research

from universities, journals, and other organizations

A simple solution for big data: New algorithm simplifies the categorization of data

Date:
June 26, 2014
Source:
International School of Advanced Studies (SISSA)
Summary:
Categorizing and representing huge amounts of data -- we're talking about peta- or even exabytes of information -- synthetically is a challenge of the future. A research paper proposes an efficient procedure to face up to this challenge.

These are images used to test the algorithm.
Credit: SISSA

Experts use the expression big data to indicate huge amounts of information, such as those (photos, videos, texts, but also other more technical types of data) shared at any time by billions of people on computers, smartphones and other electronic devices. The present-day scenario offers unprecedented perspectives: tracking flu epidemics, monitoring road traffic in real time, or handling the emergency of natural disasters, for example. For us to be able to use these huge amounts of data, we have to understand them and before that we need to categorize them in an effective, fast and automatic manner.

Related Articles


One of the most commonly used systems is a series of statistical techniques called Cluster Analysis (CA), which is able to group data sets according to their "similarity." Two researchers from SISSA devised a type of CA based on simple and powerful principles, which proved to be very efficient and capable of solving some of the most typical problems encountered in this type of analysis.

Data sets can be imagined as "clouds" of data points in a multidimensional space. These points are generally differently distributed: more widely scattered in one area and denser in another. CA is used to identify the denser areas efficiently, grouping the data in a certain number of significant subsets on the basis of this criterion. Each subset corresponds to a category.

"Think of a database of facial photographs ," explains Alessandro Laio, professor of Statistical and Biological Physics at SISSA. "The database may contain more than one photo of the same person, so CA us used to group all the pictures of the same individual. This type of analysis is carried out by automatic facial recognition systems, for example."

"We tried to devise a more efficient algorithm than those currently used, and one capable of solving some of the classic problems of CA," continues Laio.

More in detail…

"Our approach is based on a new way of identifying the centre of the cluster, i.e., the subsets," explains Alex Rodrigez, co-author of the paper. "Imagine having to identify all the cities in the world, without having access to a map. A huge task," says Rodriguez. "We therefore identified a heuristic, that is, a simple rule or a sort of shortcut to achieve the result."

To find out if a place is a city we can ask each inhabitant to count his "neighbours," in other words, how many people live within 100 metres from his house. Once we have this number, we then go on to find, for each inhabitant, the shortest distance at which another inhabitant with a greater number of neighbours lives. "Together, these two data," explains Laio, "tell us how densely populated is the area where an individual lives and the distance between individuals who have the most neighbours. By automatically cross-checking these data, for the entire world population, we can identify the individuals who represent the centres of the clusters, which correspond to the various cities." "Our algorithm performs precisely this kind of calculation, and it can be applied to many different settings," adds Rodriguez.

The performance of the procedure proved to be optimal: "we tested our mathematical model on the Olivetti Face Database, an archive of facial photographs, obtaining highly satisfactory results. The system recognised most individuals correctly, and never produced 'false positive' results," comments Rodriguez. "This means that in some cases it failed to recognise a subject, but it never once confused one individual with another. Compared to other similar methods, ours was particularly effective in eliminating outliers, that is, those data points that are so very different from the others that they tend to skew the analysis."


Story Source:

The above story is based on materials provided by International School of Advanced Studies (SISSA). Note: Materials may be edited for content and length.


Journal Reference:

  1. A. Rodriguez, A. Laio. Clustering by fast search and find of density peaks. Science, 2014; 344 (6191): 1492 DOI: 10.1126/science.1242072

Cite This Page:

International School of Advanced Studies (SISSA). "A simple solution for big data: New algorithm simplifies the categorization of data." ScienceDaily. ScienceDaily, 26 June 2014. <www.sciencedaily.com/releases/2014/06/140626141650.htm>.
International School of Advanced Studies (SISSA). (2014, June 26). A simple solution for big data: New algorithm simplifies the categorization of data. ScienceDaily. Retrieved November 26, 2014 from www.sciencedaily.com/releases/2014/06/140626141650.htm
International School of Advanced Studies (SISSA). "A simple solution for big data: New algorithm simplifies the categorization of data." ScienceDaily. www.sciencedaily.com/releases/2014/06/140626141650.htm (accessed November 26, 2014).

Share This


More From ScienceDaily



More Computers & Math News

Wednesday, November 26, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

FCC Forces T-Mobile To Alert Customers Of Data Throttling

FCC Forces T-Mobile To Alert Customers Of Data Throttling

Newsy (Nov. 25, 2014) T-Mobile and the FCC have reached an agreement requiring the company to alert customers when it throttles their data speeds. Video provided by Newsy
Powered by NewsLook.com
Symantec Uncovers Sophisticated Spying Malware Regin

Symantec Uncovers Sophisticated Spying Malware Regin

Newsy (Nov. 24, 2014) A Symantec white paper reveals details about Regin, a spying malware of unusual complexity which is believed to be state-sponsored. Video provided by Newsy
Powered by NewsLook.com
How to Keep Your Android Device Safe This Holiday Season

How to Keep Your Android Device Safe This Holiday Season

Howdini (Nov. 24, 2014) Protect yourself against malware and hackers, especially during the hectic online shopping season. Mobile device security makes a great holiday gift and protects your loved ones from cyber attacks and identity theft. Video provided by Howdini
Powered by NewsLook.com
How to Keep You and Your Family's Identitiy Safe Online This Holiday Season

How to Keep You and Your Family's Identitiy Safe Online This Holiday Season

Howdini (Nov. 24, 2014) The hectic holiday season is a prime time for online identity theft, so make sure you’re protected.Be cautious when shopping online Internet security software makes a great holiday gift and protects your loved ones from cyber attacks and identity theft. Video provided by Howdini
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:

Breaking News:

Strange & Offbeat Stories


Space & Time

Matter & Energy

Computers & Math

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile: iPhone Android Web
Follow: Facebook Twitter Google+
Subscribe: RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins