Sep 23–26, 2019
Please log in

Securing your cloud data lake with a "defense in depth" approach

Tomer Shiran (Dremio), Jacques Nadeau (Dremio)
2:05pm2:45pm Thursday, September 26, 2019
Location: 1E 09
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Data engineers, data architects, CIOs, and CISOs

Level

Intermediate

Description

Cloud providers have made it extremely easy and inexpensive to store data in the cloud with object storage services such as S3 and ADLS. These services offer high availability, disaster recovery, and infinite scalability at an extremely low price point of about $20 per month per TB. As a result, companies often store many TBs or PBs of data on these systems.

One of the key challenges faced by most companies is how to secure this data and make sure that only the users and applications that need the data have access to it. Tomer Shiran and Jacques Nadeau explore a defense-in-depth approach to cloud data lake security which includes: authentication (leveraging authentication protocols such as OAuth to authenticate users and applications and taking advantage of various capabilities that enable applications to authenticate without any user intervention), access control (applying role-based access control, ACLs, and other restrictions (e.g., time) to control which users and applications can read and write data), encryption (leveraging encryption at rest (e.g., Azure storage service encryption [SSE]) to encrypt the data stored in the lake, and encryption in motion (e.g., transport layer security [TLS]) to encrypt data being sent between an application and the data lake store, as well as utilizing the cloud’s key management service to store and rotate keys), auditing (monitoring access to the data in order to catch violators before they can cause much harm), and network security (limiting the virtual networks and IP addresses from which the data lake store can be accessed).

In addition to demonstrating how to set up these critical security layers on AWS and Azure, Tomer and Jacques show how data can be secured at a more granular level with column- and row-based access control as well as masking of personally identifiable information (PII) data. By the time you leave, you’ll have a deep understanding of the range of data lake security capabilities available in the main public clouds, as well as best practices for integrating these capabilities with the processing and analysis layers of the cloud data lake.

Prerequisite knowledge

  • Experience with AWS or Azure (useful but not required)

What you'll learn

  • Understand the range of data lake security capabilities available in the main public clouds, as well as best practices for integrating these capabilities with the processing and analysis layers of the cloud data lake
Photo of Tomer Shiran

Tomer Shiran

Dremio

Tomer Shiran is cofounder and CEO of Dremio, the data lake engine company. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, road map, and new feature development and helped grow the company from 5 employees to over 300 employees and 700 enterprise customers; and he held numerous product management and engineering positions at Microsoft and IBM Research. He’s the author of eight US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from the Technion, the Israel Institute of Technology.

Photo of Jacques Nadeau

Jacques Nadeau

Dremio

Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires