Securing your cloud data lake with a "defense in depth" approach
Who is this presentation for?data engineer, data architect, CIO, CISO
Cloud providers have made it extremely easy and inexpensive to store data in the cloud with object storage services such as S3 and ADLS. These services offer high availability, disaster recovery and infinite scalability at an extremely low price point of about $20/month per TB. As a result, companies often store many TBs or PBs of data on these systems.
One of the key challenges faced by most companies is how to secure this data and make sure that only the users and applications that need the data have access to it. In this talk we explore a defense in depth approach to cloud data lake security which includes:
- Authentication. Leveraging authentication protocols such as OAuth to authenticate users and applications, and taking advantage of various capabilities that enable applications to authenticate without any user intervention.
- Access control. Applying role based access control, ACLs and other restrictions (e.g., time) to control which users and applications can read and write data.
- Encryption. Leveraging encryption at rest (e.g., Azure SSE) to encrypt the data stored in the lake, and encryption in motion (e.g., TLS) to encrypt data being sent between an application and the data lake store, as well as utilizing the cloud’s key management service to store and rotate keys.
- Auditing. Monitoring access to the data in order to catch violators before they can cause much harm.
- Network security. Limiting the virtual networks and IP addresses from which the data lake store can be accessed.
In addition to demonstrating how to set up these critical security layers on AWS and Azure, we also show how data can be secured at a more granular level with column- and row-based access control as well as masking of PII data.
By the end of this talk, you will have a deep understanding of the range of data lake security capabilities available in the main public clouds, as well as best practices for integrating these capabilities with the processing and analysis layers of the cloud data lake.
Prerequisite knowledgeSome experience with AWS or Azure will be highly beneficial.
What you'll learn
Tomer Shiran is cofounder and CEO of Dremio. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, roadmap, and new feature development. As a member of the executive team, he helped grow the company from 5 employees to over 300 employees and 700 enterprise customers. Prior to MapR, Tomer held numerous product management and engineering positions at Microsoft and IBM Research. He is the author of five US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from the Technion, the Israel Institute of Technology.
Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts