Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Interactive data exploration and analysis at enterprise scale

Sean Kandel (Trifacta), Kaushal Gandhi (Trifacta)
5:25pm6:05pm Wednesday, September 27, 2017

Who is this presentation for?

  • Data scientists, data architects, Hadoop users, and business analysts

Prerequisite knowledge

  • Familiarity with Hadoop applications and data cleansing and preparation

What you'll learn

  • Learn best practices for deploying Hadoop applications to support data exploration in an enterprise scale

Description

Organizations deploying Hadoop are storing, organizing, processing, and analyzing more data than ever before, and the number of analytic applications natively integrating with Hadoop has grown rapidly in the last few years. Consequently, there are often hundreds or thousands of business and data analysts that leverage Hadoop clusters to explore, wrangle, visualize, and operationalize data for diverse use cases. As cluster utilization increases, however, maintaining performance of both exploratory and production use cases becomes critical.

Sean Kandel and Kaushal Gandhi share best practices for building and deploying Hadoop applications to support large-scale data exploration and analysis across an organization and demonstrate techniques to amortize exploratory workloads across clients to scale deployments while limiting performance degradation. Along the way, Sean and Kaushal explain how to flexibly compile queries across multiple runtime engines to optimize both data analytic and transformation queries and compare benchmarks for multiple architectures, demonstrating the effects of these techniques in data lake initiatives.

Photo of Sean Kandel

Sean Kandel

Trifacta

Sean Kandel is the founder and chief technical officer at Trifacta. Sean holds a PhD from Stanford University, where his research focused on new interactive tools for data transformation and discovery, such as Data Wrangler. Prior to Stanford, Sean worked as a data analyst at Citadel Investment Group.

Photo of Kaushal Gandhi

Kaushal Gandhi

Trifacta

Kaushal Gandhi is a senior software engineer at Trifacta, where he built Trifacta’s fast interactive transformation engine (Photon) along with various data transformation features that improve user utility and usability of the product. Previously, Kaushal built prediction and estimation software at NVIDIA. He holds an MS in computer science and engineering.