Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Anonymized data fusion: Privacy versus utility

Behrooz Hashemian (Massachusetts Institute of Technology)
4:35pm5:15pm Wednesday, September 27, 2017

Who is this presentation for?

  • Data scientists, data administrators, data architects, and data enthusiasts

What you'll learn

  • Learn a new anonymized data fusion technique that creates collective insights from various anonymized datasets


As digital technologies become more pervasive in our daily lives, we are leaving an increasing amount of digital traces, including cell phone data, health records, public transportation trajectories, credit card transactions, and connected car communications. Each dataset, although anonymized, tells a story from different aspects of human life and can be used to leverage businesses resources in specific dimensions, such as optimizing mobility systems, predicting economic growth, and forecasting customer purchases. Since the data is anonymized and all the identifiable information is removed, the information is limited to each individual dataset, and there is no common field or variable to allow for datasets to be merged. However, the patterns in people’s behavior can offer a basis to fuse data at individual level from different sources and provide valuable new insights of human behaviors, leading to more effective and useful products and services that mutually benefit businesses and customers without compromising the privacy of individuals.

Behrooz Hashemian shares a novel paradigm to combine multiple anonymized datasets through pattern recognition and statistical learning techniques. This data fusion technique is based on a fundamental concept: although people’s identities are fully anonymized, the environment that they are interacting with is not, making it possible to generate new meta-information from anonymized individual trajectories and allow information from multiple sources to complement and enrich each other without compromising people’s privacy. These linked datasets establish a collective knowledge platform that helps to build solutions and make informed decisions.

Behrooz also addresses the serious privacy concerns of the technique. What if one of the datasets contains identifiable information? This may allow for re-identification of other anonymized datasets and cause a privacy breach. This way of de-anonymization not only challenges the current anonymization techniques and policies that relies on single-dataset information but also warns of the unpredictable consequence of publishing de-identified data. This issue urges for development of new security and privacy policies as well as a new privacy-guaranteed way of interacting with data.

Photo of Behrooz Hashemian

Behrooz Hashemian

Massachusetts Institute of Technology

Behrooz Hashemian is a researcher and chief data officer at MIT’s Senseable City Lab, where he investigates the innovative implementation of big data analytics and artificial intelligence in smart cities, finance, and healthcare. A data scientist with expertise in developing predictive analytics strategies, machine learning solutions, and data-driven platforms for informed decision making, Behrooz endeavors to bridge the gap between academic research and industrial deployment of big data analytics and artificial intelligence. He is also leading an unprecedented project on anonymized data fusion, which provides a multidimensional insight into urban activities and customer behaviors from multiple sources.