Uber operates at scale, with thousands of microservices serving millions of rides a day, leading to 100+ PB of data. This data powers multiple business use cases, such as machine learning, model training, data preparation, traditional business intelligence, visualization and reporting, but it first needs to be ingested, transformed, and dispersed in order to provide value to the business.
To democratize data pipelines, Uber needed a central tool that provides a way to author, manage, schedule, and deploy data workflows at scale. Alex Kira details Uber’s journey toward a unified and scalable data workflow system used to manage this data and shares the challenges faced and how the company has rearchitected several components of the system—such as scheduling and serialization—to make them highly available and more scalable. Alex also outlines future plans for making the workflow platform more streamlined and easier to use.
Alex Kira is an engineering tech lead at Uber, where he works on the data workflow management team. His team provides a data infrastructure platform for thousands of engineers, data scientists, and city ops, thereby empowering them to own and manage their data pipelines. During his 19-year career, he’s had experience across several software disciplines, including distributed systems, data infrastructure, and full stack development, giving him a holistic systems view of his projects. He holds an undergraduate degree in computer science from the University of Miami and a master’s degree from the Georgia Institute of Technology. In his free time, Alex enjoys hiking around the Bay Area, rock climbing, and traveling internationally.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com