Enterprises benefit greatly from data lakes, but building an enterprise data lake isn’t a simple task. The right platform must be implemented and optimized in order to provide repeatable results, and certain challenges must be addressed—not least those of data management, security, and governance. To take advantage of the latest technology options for big data processing, storage, and resource management, which are easily accessible in the cloud, more and more enterprises are ready to start building hybrid data lakes. But as in the on-premises world, overcoming integration challenges, operationalizing, securing, governing, and enabling self-service usage remain formidable challenges in the cloud. What’s more, IT architects are increasingly asked to design portability into their data lake solutions, allowing them to easily move analytics workloads between the data center and the cloud. A portability strategy increases business agility, protects application teams against frequent changes to their data pipelines across multiple environments, and can help reduce the risk of vendor lock-in.
Jonathan Gray explores the standardization, automation, and deep integration technologies that allow enterprises to transform their business by building and operating successful, self-service data lakes with IT guardrails on-premises and in the cloud, while avoiding the undesirable complexities, inefficiencies, and risks resulting from the messy and diverse nature of big data. Jonathan discusses the many benefits of implementing a data lake, both on-premises and in the cloud, and addresses the associated challenges, including data integration, sanitization, security, governance, and more. He then explores how cloud technologies such as Azure WASB, Azure Data Lake Storage (ADLS), S3, Redshift, and U-SQL are helping enterprises overcome their migration challenges to build portable, reliable, and secured data lakes that work seamlessly across on-premises and cloud deployments and shares customer use cases that will inspire enterprises to embark on a multi-environment data lake journey.
This session is sponsored by Cask.
Jonathan Gray is the founder and CEO of Cask. Jonathan is an entrepreneur and software engineer with a background in startups, open source, and all things data. Previously, he was a software engineer at Facebook, where he helped drive HBase engineering efforts, including Facebook Messages and several other large-scale projects, from inception to production. An open source evangelist, Jonathan was responsible for helping build the Facebook engineering brand through developer outreach and refocusing the open source strategy of the company. Prior to Facebook, Jonathan founded Streamy.com, where he became an early adopter of Hadoop and HBase. He is now a core contributor and active committer in the community. Jonathan holds a bachelor’s degree in electrical and computer engineering from Carnegie Mellon University.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org