Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Hoodie: Incremental processing on Hadoop at Uber

1:50pm2:30pm Thursday, March 16, 2017
Spark & beyond
Location: LL20 A
Secondary topics:  Platform, Streaming

What you'll learn

  • Explore data processing systems for near-real-time use cases at Uber
  • Discover Hoodie, Uber's newly open sourced storage system

Description

Uber’s mission is to provide transportation as reliable as running water, everywhere, for everyone. To fulfill its mission, Uber relies on making data-driven decisions at every level, and most of these decisions can benefit from faster data processing.

Vinoth Chandar and Prasanna Rajaperumal explore data processing systems for near-real-time use cases, making the case that adding new incremental processing primitives to existing Hadoop technologies can solve many problems at reduced cost and in a unified manner. Along the way, Vinoth and Prasanna introduce Hoodie, a newly open sourced storage system at Uber that adds new incremental processing primitives to existing Hadoop technologies to provide near-real-time data at 10x reduced cost using Spark and Hadoop and share their production experience.

Photo of Vinoth Chandar

Vinoth Chandar

Uber

Vinoth Chandar works on data infrastructure at Uber, with a focus on Hadoop and Spark. Vinoth has keen interest in unified architectures for data analytics and processing. Previously, Vinoth was the LinkedIn lead on Voldemort and worked on Oracle server’s replication engine, HPC, and stream processing.

Photo of Prasanna Rajaperumal

Prasanna Rajaperumal

Uber

Prasanna Rajaperumal is a senior engineer at Uber working on building the next generation Uber data infrastructure and building data systems that scale along with Uber’s hyper growth. Over the last six months, he has been focused on building a library that ingests change logs into large HDFS datasets, optimized for analytical workloads. Prasanna has held various roles at small to large companies building data systems. Previously, he was a software engineer at Cloudera working on building out data infrastructure for indexing and visualizing customer log files.