Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
This talk is based on recent papers we published in Science, Nature SRep, and IEEE Data Engineering on the privacy challenges of large-scale behavioral data. This work, done at the MIT Media Lab, shows that metadata from mobile phones and credit cards might not be as anonymous as we think. At a time where tremendous amounts of user data are becoming available, understanding the limits of an individual’s privacy will be crucial in the design of both future policies and information technologies. This work has been covered in WEF, BBC, CNN, GigaOm, Wired, and Technology Review.
In this talk, I will show how four points — approximate places and times — are enough to identify 95% of individuals in a mobility database of 1.5 million people, and 90% of individuals in a credit card database of 1M people. What this result means is that identifying people in a large-scale metadata database is likely to be easy even though no “private” information, such as names, e-mails, phone numbers, or account numbers, was ever collected. Metadata thus truly acts a fingerprint. This digital fingerprint turns out to be more unique than the traditional physical fingerprint.
I will further show how human behavior puts fundamental constraints on the privacy of individuals, and how traditional data protection schemes are outdated. Indeed, these constraints hold even when the resolution of the dataset is low. In both cases, even coarse datasets provide little anonymity. Using large-scale data, I will show how we derived a formula to estimate the uniqueness of human mobility traces. This formula can be used as a rule of thumb to estimate the privacy of a dataset knowing its spatial and temporal resolution.
This data is, however, of great value and all of us; users, companies, and scientists have a lot to gain from its uses. There is far more to mobile phone, credit card, or wearable data than just privacy concerns. It is therefore of tremendous importance to understand how to use this data while preserving people’s privacy. I hope this talk will help attendees understand what is possible and what is not when it comes to privacy. I will conclude by discussing some of the legal and technical solutions we are currently developing at the Media Lab.
Yves-Alexandre de Montjoye is a lecturer at Imperial College London, a research scientist at the MIT Media Lab, and a postdoctoral researcher at Harvard IQSS. His research aims to understand how the unicity of human behavior impacts the privacy of individuals—through reidentification or inference—in large-scale metadata datasets such as mobile phone, credit cards, or browsing data. Previously, he was a researcher at the Santa Fe Institute in New Mexico, worked for the Boston Consulting Group, and acted as an expert for both the Bill and Melinda Gates Foundation and the United Nations. Yves-Alexandre was recently named an innovator under 35 for Belgium. His research has been published in Science and Nature Scientific Reports and has been covered by the BBC, CNN, the New York Times, the Wall Street Journal, Harvard Business Review, Le Monde, Die Spiegel, Die Zeit, and El Pais as well as in his TEDx talks. His work on the shortcomings of anonymization has appeared in reports of the World Economic Forum, United Nations, OECD, FTC, and the European Commission. He is a member of the OECD Advisory Group on Health Data Governance. Yves-Alexandre holds a PhD in computational privacy from MIT, an MSc in applied mathematics from Louvain, an MSc (centralien) from École Centrale Paris, an MSc in mathematical engineering from KU Leuven, and a BSc in engineering from Louvain.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.