Vinoth Chandar explains how Uber revamped its foundational data infrastructure with Hadoop as the source-of-truth data lake and Spark as the de facto processing engine, sharing lessons from the experience. Vinoth provides an overview of the data ecosystem at Uber and details the old and the current data architecture at Uber, discussing some of the unique challenges that influenced them. Vinoth also shares the roadmap ahead around areas such as all-active data architecture, Spark infrastructure, interactive SQL, and a bigger initiative to reduce data latency into Hadoop.
Vinoth Chandar is the Co-Creator of the Hudi project at Uber and also PMC/Lead of Apache Hudi (Incubating). Previously, he was a senior staff engineer at Uber, where he led projects across various technology areas like data infrastructure, data architecture & mobile/network performance. Vinoth has keen interest in unified architectures for data analytics and processing. Previously, he was the LinkedIn lead on Voldemort and worked on Oracle Server’s replication engine, HPC, and stream processing.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.