Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Hybrid data lakes: Unlocking the inevitable (sponsored by Cask)

Jonathan Gray (Cask)
4:35pm5:15pm Wednesday, September 27, 2017
Location: 1A 04/05
Average rating: **...
(2.50, 2 ratings)

What you'll learn

  • Explore the many benefits of implementing a data lake, both on-premises and in the cloud, and the associated challenges


Enterprises benefit greatly from data lakes, but building an enterprise data lake isn’t a simple task. The right platform must be implemented and optimized in order to provide repeatable results, and certain challenges must be addressed—not least those of data management, security, and governance. To take advantage of the latest technology options for big data processing, storage, and resource management, which are easily accessible in the cloud, more and more enterprises are ready to start building hybrid data lakes. But as in the on-premises world, overcoming integration challenges, operationalizing, securing, governing, and enabling self-service usage remain formidable challenges in the cloud. What’s more, IT architects are increasingly asked to design portability into their data lake solutions, allowing them to easily move analytics workloads between the data center and the cloud. A portability strategy increases business agility, protects application teams against frequent changes to their data pipelines across multiple environments, and can help reduce the risk of vendor lock-in.

Jonathan Gray explores the standardization, automation, and deep integration technologies that allow enterprises to transform their business by building and operating successful, self-service data lakes with IT guardrails on-premises and in the cloud, while avoiding the undesirable complexities, inefficiencies, and risks resulting from the messy and diverse nature of big data. Jonathan discusses the many benefits of implementing a data lake, both on-premises and in the cloud, and addresses the associated challenges, including data integration, sanitization, security, governance, and more. He then explores how cloud technologies such as Azure WASB, Azure Data Lake Storage (ADLS), S3, Redshift, and U-SQL are helping enterprises overcome their migration challenges to build portable, reliable, and secured data lakes that work seamlessly across on-premises and cloud deployments and shares customer use cases that will inspire enterprises to embark on a multi-environment data lake journey.

Topics include:

  • Drivers and benefits of an enterprise data lakes
  • Choosing the right environment for your data lake (on-premises, cloud, or hybrid)
  • A layer of abstraction to ensure portability, reusability, and future-proofing
  • API support, management, and replication to operationalize a data lake
  • Sophisticated security, audit, and encryption for compliance needs
  • Self-service user experience for citizen integrators and business users
  • Packaged solutions and prebuilt components for rapid time to value

This session is sponsored by Cask.

Photo of Jonathan Gray

Jonathan Gray


Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Previously, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded, where he became an early adopter of Hadoop and HBase. He is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in electrical and computer engineering from Carnegie Mellon University.