Featured Research

from universities, journals, and other organizations

How hard is it to 'de-anonymize' cellphone data?

Date:
March 27, 2013
Source:
Massachusetts Institute of Technology
Summary:
Scientists analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them. This means that to extract the complete location information for a single person from an “anonymized” data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person’s whereabouts.

The proliferation of sensor-studded cellphones could lead to a wealth of data with socially useful applications — in urban planning, epidemiology, operations research and emergency preparedness, among other things.
Credit: Rendering by Christine Daniloff/MIT of an original image by Yves-Alexandre de Montjoye et al.

The proliferation of sensor-studded cellphones could lead to a wealth of data with socially useful applications -- in urban planning, epidemiology, operations research and emergency preparedness, among other things. Of course, before being released to researchers, the data would have to be stripped of identifying information. But how hard could it be to protect the identity of one unnamed cellphone user in a data set of hundreds of thousands or even millions?

According to a paper appearing this week in Scientific Reports, harder than you might think. Researchers at MIT and the Université Catholique de Louvain, in Belgium, analyzed data on 1.5 million cellphone users in a small European country over a span of 15 months and found that just four points of reference, with fairly low spatial and temporal resolution, was enough to uniquely identify 95 percent of them.

In other words, to extract the complete location information for a single person from an "anonymized" data set of more than a million people, all you would need to do is place him or her within a couple of hundred yards of a cellphone transmitter, sometime over the course of an hour, four times in one year. A few Twitter posts would probably provide all the information you needed, if they contained specific information about the person's whereabouts.

The first author on the paper is Yves-Alexandre de Montjoye, a graduate student in the research group of Toshiba Professor of Media Arts and Science Sandy Pentland. He's joined by César Hidalgo, an assistant professor of media arts and science; Vincent Blondel, a visiting professor at MIT and a professor of applied mathematics at Université Catholique; and Michel Verleysen, a professor of electrical engineering at Université Catholique.

Focusing the debate

Hidalgo's group specializes in applying the tools of statistical physics to a wide range of subjects, from communications networks to genetics to economics. In this case, he and de Montjoye were able to use those tools to uncover a simple mathematical relationship between the resolution of spatiotemporal data and the likelihood of identifying a member of a data set.

According to their formula, the probability of identifying someone goes down if the resolution of the measurements decreases, but less than you might think. Reporting the time of each measurement as imprecisely as sometime within a 15-hour span, or location as imprecisely as somewhere amid 15 adjacent cell towers, would still enable the unique identification of half the people in the sample data set.

But while its initial application may be discouraging, de Montjoye and Hidalgo hope that their formula will provide a way for researchers and policy analysts to reason more rigorously about the privacy safeguards that need to be put in place when they're working with aggregated location data.

"Both César and I deeply believe that we all have a lot to gain from this data being used," de Montjoye says. "This formula is something that could be useful to help the debate and decide, OK, how do we balance things out, and how do we make it a fair deal for everyone to use this data?"

Everybody's different

In the data set that the researchers analyzed, the location of a cellphone was inferred solely from that of the cell tower it was connected to, and the time of the connection was given as falling within a one-hour interval. Each cellphone had a unique, randomly generated identifying number, so that its movement could be traced over time. But there was no information connecting that number to the phone's owner.

The researchers randomly selected a representative sampling from the set of 1.5 million cellphone traces and, for each trace, began choosing points at random. For 95 percent of the traces, just four randomly selected points was enough to distinguish them from all other traces in the database. In the worst (or, from another perspective, best) case, 11 measurements were necessary.

"There's a concern with this data, to what extent can we preserve anonymity," says Luis Bettencourt, a professor at the Santa Fe Institute who studies social systems. "What they are showing here, quite clearly, is that it's very hard to preserve anonymity."

But for Bettencourt, the uniqueness of people's trajectories through cities is itself precisely the type of information that analysis of cellphone data is meant to uncover. "This is interesting, from a scientific point of view, to understand how people use urban space," Bettencourt says. "It shows what kind of social systems cities are."

The researchers suspect that similar relationships might hold for other types of data. "I would not be surprised if a similar result -- maybe requiring more points -- would, for example, extend to web browsing," Hidalgo says. "The space of potential combinations is really large. When a person is, in some sense, being expressed in a space in which the total number of combinations is huge, the probability that two people would have the same exact trajectory -- whether it's walking or browsing -- is almost nil."


Story Source:

The above story is based on materials provided by Massachusetts Institute of Technology. The original article was written by Larry Hardesty. Note: Materials may be edited for content and length.


Journal Reference:

  1. Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, Vincent D. Blondel. Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports, 2013; 3 DOI: 10.1038/srep01376

Cite This Page:

Massachusetts Institute of Technology. "How hard is it to 'de-anonymize' cellphone data?." ScienceDaily. ScienceDaily, 27 March 2013. <www.sciencedaily.com/releases/2013/03/130327132547.htm>.
Massachusetts Institute of Technology. (2013, March 27). How hard is it to 'de-anonymize' cellphone data?. ScienceDaily. Retrieved April 16, 2014 from www.sciencedaily.com/releases/2013/03/130327132547.htm
Massachusetts Institute of Technology. "How hard is it to 'de-anonymize' cellphone data?." ScienceDaily. www.sciencedaily.com/releases/2013/03/130327132547.htm (accessed April 16, 2014).

Share This



More Computers & Math News

Wednesday, April 16, 2014

Featured Research

from universities, journals, and other organizations


Featured Videos

from AP, Reuters, AFP, and other news services

German Researchers Crack Samsung's Fingerprint Scanner

German Researchers Crack Samsung's Fingerprint Scanner

Newsy (Apr. 16, 2014) — German researchers have used a fake fingerprint made from glue to bypass the fingerprint security system on Samsung's new Galaxy S5 smartphone. Video provided by Newsy
Powered by NewsLook.com
Twitter, Apple Social Data Purchases Likely to Spur More Mergers and Acquisitions

Twitter, Apple Social Data Purchases Likely to Spur More Mergers and Acquisitions

TheStreet (Apr. 16, 2014) — The social media data space is likely to see more mergers and acquisitions following Twitter Inc.'s acquisition of tweet analyzer Gnip Inc. on Tuesday and Apples Inc.'s purchase of Topsy Labs Inc. back in December. One firm in particular, the U.K.'s DataSift Inc., could be on the list of potential buyers. Among other social media startups that could be ripe for picking is Banjo, whose mobile app provides aggregated content by topic and location. Banjo could also be a good fit for Twitter. Video provided by TheStreet
Powered by NewsLook.com
Bitcoin Exchange Mt. Gox to Liquidate After Rebuilding Rejected

Bitcoin Exchange Mt. Gox to Liquidate After Rebuilding Rejected

TheStreet (Apr. 16, 2014) — Bitcoin exchange Mt. Gox has agreed to liquidate after a Japanese court rejected its plans to rebuild, according to a report by the Wall Street Journal. Mt. Gox filed for bankruptcy protection in February after announcing about 850,000 bitcoins, worth around $454 million at today's rates, may have been stolen by hackers. It has since recovered 200,000 of the missing bitcoins. The court put Mt. Gox's assets under a provisional administrator's control until bankruptcy proceedings begin. Video provided by TheStreet
Powered by NewsLook.com
BlackBerry: The Crash That Launched 1,000 Startups

BlackBerry: The Crash That Launched 1,000 Startups

Reuters - Business Video Online (Apr. 16, 2014) — Tech startups in BlackBerry's hometown of Waterloo, Ontario, are tapping talent from the struggling smartphone company and filling the void left in the region by its meltdown. Reuters correspondent Euan Rocha visits the region that could become Canada's Silicon Valley. Video provided by Reuters
Powered by NewsLook.com

Search ScienceDaily

Number of stories in archives: 140,361

Find with keyword(s):
 
Enter a keyword or phrase to search ScienceDaily for related topics and research stories.

Save/Print:
Share:  

Breaking News:
from the past week

In Other News

... from NewsDaily.com

Science News

Health News

Environment News

Technology News



Save/Print:
Share:  

Free Subscriptions


Get the latest science news with ScienceDaily's free email newsletters, updated daily and weekly. Or view hourly updated newsfeeds in your RSS reader:

Get Social & Mobile


Keep up to date with the latest news from ScienceDaily via social networks and mobile apps:

Have Feedback?


Tell us what you think of ScienceDaily -- we welcome both positive and negative comments. Have any problems using the site? Questions?
Mobile iPhone Android Web
Follow Facebook Twitter Google+
Subscribe RSS Feeds Email Newsletters
Latest Headlines Health & Medicine Mind & Brain Space & Time Matter & Energy Computers & Math Plants & Animals Earth & Climate Fossils & Ruins