Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Protecting sensitive data in huge datasets: Cloud tools you can use

Felipe Hoffa (Google), Damien Desfontaines (Google | ETH Zürich)
11:20am–12:00pm Wednesday, 09/12/2018
Secondary topics:  Ethics and Privacy
Average rating: ****.
(4.00, 1 rating)

Who is this presentation for?

  • Data scientists and data engineers

Prerequisite knowledge

  • Familiarity with SQL

What you'll learn

  • Learn how to identify PII in massive datasets
  • Explore k-anonymity, l-diversity, and related research and options such as removing, masking, and coarsening
  • Gain experience with practical demos over massive datasets


Before releasing a public dataset, practitioners need to thread the needle between utility and protection of individuals. Felipe Hoffa and Damien Desfontaines explore how to handle massive public datasets, taking you from theory to real life as they showcase newly available tools that help with PII detection and brings concepts like k-anonymity and l-diversity to the practical realm. You’ll also cover options such as removing, masking, and coarsening.

Related research: “Considerations for sensitive data within machine learning datasets”

Photo of Felipe Hoffa

Felipe Hoffa


Felipe Hoffa is a developer advocate for big data at Google, where he inspires developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several videos, blog posts, and conferences around the world.

Photo of Damien Desfontaines

Damien Desfontaines

Google | ETH Zürich

Damien Desfontaines protects personal data for a living. He’s a privacy engineer at Google, where he builds scalable anonymization tools, conducts privacy reviews, and translates high-level policies into technical best practices, and a doctoral researcher at ETH Zürich, focusing on differential privacy. He also sometimes vulgarizes academic definitions of privacy on his blog.

Comments on this page are now closed.


chandrakala venkatesan | SYTEM TEST ENGINEER
09/12/2018 7:53am EDT

Is there a way to get the presentation slide deck?