Presented By O'Reilly and Cloudera
Make Data Work
31 May–1 June 2016: Training
1 June–3 June 2016: Conference
London, UK

Protecting individual privacy in a data-driven world

Jason McFall (Privitar)
14:55–15:35 Thursday, 2/06/2016
Law, ethics, governance
Location: Capital Suite 17 Level: Non-technical
Average rating: ****.
(4.46, 13 ratings)

As data practitioners, we come to Strata because we are excited by the opportunities to unlock the value in data. But as individuals, we are each sensitive to how our own data is used, and we want our privacy to be respected. We expect organizations to keep our data secure, but we also expect them to use our data ethically and not exploit or leak our private data. Many citizens are simply unaware of the degree to which their trails of data can reveal highly private information. Meanwhile, organizations are not doing enough to preserve privacy; they need to find privacy-preserving ways to analyze and operationalize data.

Organizations may be open to far greater liability due to possible customer reidentification than they realize. Jason McFall surveys the risks around private data and discusses some examples of privacy breaches where well-meaning and responsible organizations inadvertently violated privacy because they didn’t understand the threats they faced—including linkage attacks, where connecting data to a public dataset can reveal privacy; network graph matching: identifying segments of a graph (such as a social graph) and then walking the graph; and the risks of aggregate data, where often a single data point seems innocuous in isolation but in aggregate can reveal very private information—in real-world examples such as mining social network comments, likes, and friend graphs; connecting location information to learn patterns about where a person lives, works, and travels; or exploiting Internet of Things data.

Jason outlines techniques that enable the safe and effective use of data while preserving privacy, including tokenization and masking, generalization and blurring of data (such as k-anonymity), controlled privacy-preserving querying of data (such as differential privacy), homomorphic encryption, and randomized responses for the IoT, and explores the strengths and weaknesses of these approaches, before listing some key lessons that individual citizens, organizations, and data scientists need to know about privacy.

Photo of Jason McFall

Jason McFall


Jason McFall is the CTO at Privitar, a London startup using machine learning and statistical techniques to open up data for safe secondary use, without violating individual privacy. Jason has a background in applying machine learning to marketing automation and customer analytics. Before that, he was an experimental physicist, working on particle physics collider experiments.