Turning big data into knowledge: Managing metadata and data relationships at Uber's scale
Who is this presentation for?
- Data scientists, engineers, data analysts, and product managers
Uber takes data driven to the next level with the complexity of its systems and breadth of data, processing trillions of Kafka messages per day, operating thousands of microservices, storing hundreds of petabytes of data in Hadoop distributed file systems (HDFSs) across multiple data centers, and supporting millions of weekly analytical queries.
Kaan Onuk, Luyao Li, and Atul Gupte explore the current state of metadata and lineage management at Uber’s scale and share a sneak peak of what’s coming next in big data management.
Because big data by itself isn’t enough to leverage insights; to be used efficiently and effectively, data at Uber’s scale requires context to make business decisions and derive insights. To provide further insight, the company built Databook, Uber’s in-house platform that surfaces and manages metadata, and uStruct, the lineage platform that understands the end-to-end data flow and manages relationships across Uber’s mobile app to services, storage, and analytics.
- A basic understanding of big data concepts
What you'll learn
- Discover how Uber thinks about building big data knowledge platforms to allow teams to discover, manage, and govern entities
- Explore how to build an extensible data management platform and infrastructure to democratize data at Uber's scale
Kaan Onuk is an engineering manager at Uber, where he leads the metadata management team on the Big Data org. Previously, he was a tech lead at Uber, where he designed and built infrastructure to power data discovery and data privacy, and he helped build data infrastructure from the ground up at Graphiq, a startup acquired by Amazon. Kaan holds a master’s degree in electrical engineering from the University of Southern California.
Luyao Li is a technical lead manager on the data platform team at Uber, where he manages the data lineage team, which builds systems including end-to-end data flow tracking, latency tracking, and cost attribution and pricing. Previously he built multiple systems spanning from service discovery, configuration management, and ad campaign results tracking and reporting as a software engineer at Electronic Arts. He holds a master’s degree from Duke University.
Atul Gupte is a product manager on the product platform team at Uber, where he helps drive product decisions to ensure Uber’s data science teams are able to achieve their full potential by providing access to foundational infrastructure, stable compute resources, and advanced tooling to power Uber’s global ambitions. Previously, he built some of the world’s leading social games and helped build out the mobile advertising platform at Zynga. He holds a BS in computer science from the University of Illinois at Urbana-Champaign.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts