Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Hoodie: Incremental processing on Hadoop at Uber

Vinoth Chandar (Apache Hudi), Prasanna Rajaperumal (Uber)
1:50pm2:30pm Thursday, March 16, 2017
Spark & beyond
Location: LL20 A
Secondary topics:  Platform, Streaming

What you'll learn

  • Explore data processing systems for near-real-time use cases at Uber
  • Discover Hoodie, Uber's newly open sourced storage system


Uber’s mission is to provide transportation as reliable as running water, everywhere, for everyone. To fulfill its mission, Uber relies on making data-driven decisions at every level, and most of these decisions can benefit from faster data processing.

Vinoth Chandar and Prasanna Rajaperumal explore data processing systems for near-real-time use cases, making the case that adding new incremental processing primitives to existing Hadoop technologies can solve many problems at reduced cost and in a unified manner. Along the way, Vinoth and Prasanna introduce Hoodie, a newly open sourced storage system at Uber that adds new incremental processing primitives to existing Hadoop technologies to provide near-real-time data at 10x reduced cost using Spark and Hadoop and share their production experience.

Photo of Vinoth Chandar

Vinoth Chandar

Apache Hudi

Vinoth Chandar is the Co-Creator of the Hudi project at Uber and also PMC/Lead of Apache Hudi (Incubating). Previously, he was a senior staff engineer at Uber, where he led projects across various technology areas like data infrastructure, data architecture & mobile/network performance. Vinoth has keen interest in unified architectures for data analytics and processing. Previously, he was the LinkedIn lead on Voldemort and worked on Oracle Server’s replication engine, HPC, and stream processing.

Photo of Prasanna Rajaperumal

Prasanna Rajaperumal


Prasanna Rajaperumal is a senior engineer at Uber working on building the next generation Uber data infrastructure and building data systems that scale along with Uber’s hyper growth. Over the last six months, he has been focused on building a library that ingests change logs into large HDFS datasets, optimized for analytical workloads. Prasanna has held various roles at small to large companies building data systems. Previously, he was a software engineer at Cloudera working on building out data infrastructure for indexing and visualizing customer log files.