Over the past several years, ever-increasing quantities of data are being processed within public clouds. The cloud promises to provide solutions to some of the limitations of conventional single multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand compute and allows data to be shared between on-demand provisioned processing engines such as Hive, Spark, and Impala. But to fulfill this promise, you first need to solve several technical challenges: simple resource allocation, cross-cluster metadata sharing, and a common authorization framework. Without comprehensive answers to these questions, the challenges of single cluster model are simply duplicated inside a public cloud environment.
The cloud enables the delivery of solutions to single, multipurpose clusters offering hyperscale storage decoupled from elastic, on-demand computing. Mala Ramakrishnan, Eugene Fratkin, and Mark Samson detail new paradigms to effectively run production-level pipelines with minimal operational overhead. As a part of the deep dive, they also walk you through creating such a pipeline and executing data processing and data analytic workflows. Join in to learn how to remove barriers to data discovery, metadata sharing, and access control.
Mala Ramakrishnan heads product initiatives for Cloudera Altus – big data platform-as-a-service. She has 17+ years experience in product management, marketing, and software development in organizations of varied sizes that deliver middleware, software security, network optimization, and mobile computing. She holds a master’s degree in computer science from Stanford University.
Eugene Fratkin is a director of engineering at Cloudera, where he leads the company’s cloud infrastructure efforts. He was one of the founding members of the Apache MADlib project (scalable in-database algorithms for machine learning). Previously, Eugene was a cofounder of a Sequoia Capital-backed company focusing on applications of data analytics to problems of genomics. He holds PhD in computer science from Stanford University’s AI lab.
Mark Samson is a principal systems engineer at Cloudera, helping customers solve their big data problems using enterprise data hubs based on Hadoop. Mark has 17 years’ experience working with big data and information management software in technical sales, service delivery, and support roles.
Vinithra Varadharajan is an engineering manager in the cloud organization at Cloudera, where she is responsible for products such as Cloudera Director and Cloudera’s usage-based billing service. Previously, Vinithra was a software engineer at Cloudera, working on Cloudera Director and Cloudera Manager with a focus on automating Hadoop lifecycle management.
Jason is a software engineer at Cloudera focusing on the cloud.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org