Sep 23–26, 2019

Getting ready for CCPA: Securing data lakes for heavy privacy regulation

Mark Donsky (Okera)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1E 15/16
Secondary topics:  Privacy and Security

Who is this presentation for?

  • Security, admins, DevOps professionals, and data stewards




Many data lakes lack even the most basic security and governance controls. The complexity of multicompute, petabyte-scale security has proved daunting. Nonetheless, with the emergence of regulation such as the California Consume Privacy Act (CCPA) and GDPR, organizations can no longer afford to overlook security and governance.

Mark Donsky walks you through securing a Hadoop cluster. You’ll start with a cluster with no security and add security features related to authentication, authorization, encryption of data at rest, encryption of data in transit, and complete data governance. For each security feature, you’ll cover the following topics: What the security feature is, what protection it provides, and best practices and recommendations; how to enable the feature in a phased manner with the fewest growing pains and least risk; why it’s important (demonstrated by live attacks against a cluster without the target security feature) and how it relates to GDPR; and an overview of how the implementation is performed, where the moving parts are, and potential pitfalls.

Prerequisite knowledge

  • A general understanding of data lake concepts and basic security principles, plus cloud concepts, such as S3, EMR, transient clusters, etc.

Materials or downloads needed in advance


What you'll learn

  • Gain a complete understanding of best practices for security and governing cloud, on-premises, and hybrid deployments, including wire encryption, data-at-rest encryption, governance best practices, unified, secure, and data catalogs for self-service discovery
  • Understand important aspects of GDPR and CCPA
Photo of Mark Donsky

Mark Donsky


Mark Donsky leads product management at Okera, a software provider that provides discovery, access control, and governance at scale for today’s modern heterogeneous data environments. Previously, Mark led data management and governance solutions at Cloudera. Mark has held product management roles at companies such as Wily Technology, where he managed the flagship application performance management solution, and Silver Spring Networks, where he managed big data analytics solutions that reduced greenhouse gas emissions by millions of dollars annually. He holds a BS with honors in computer science from the University of Western Ontario.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts