Sep 23–26, 2019

We run, we improve, we scale: The XGBoost story at Uber

Nan Zhu (Uber), Felix Cheung (Uber)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 08/10
Average rating: ****.
(4.50, 6 ratings)

Who is this presentation for?

  • Machine learning engineers and data scientists

Level

Intermediate

Description

With the tremendous growth of Uber’s business scale, the agility and scalability of the machine learning system is the core prerequisite in making data-driven decisions to improve user experiences. With a good fitting to Uber’s requirements, XGBoost plays multiple roles across the business scope. XGBoost not only produces accurate models but also scales to handle billions of records and thousands of features. XGBoost models improve the driver’s safety during driving, recommends foods and restaurants, estimates the arrival time of rides, etc.

Nan Zhu and Felix Cheung share their insights about the internals of how XGBoost scales training to hundreds, even thousands, of workers with the accuracy guarantee. This is the first time that a community core member brings detailed internals of distributed training to a public audience. They also detail Uber’s journey with the latest version of XGBoost, including the problems the company had with the earlier version of XGBoost, how it identifies, fixes, and eventually unblocks itself by improving XGBoost and contributing back to the community. You’ll leave with a summary of lessons Uber learned and insight into its future plans.

Prerequisite knowledge

  • A basic understanding of tree machine learning model
  • Experience with XGBoost

What you'll learn

  • Get an overview of business problems Uber is solving with XGBoost
  • Learn how Uber improves the model training of XGBoost to bring more scaled business impact
  • Discover what's going to happen with XGBoost in the near future
Photo of Nan Zhu

Nan Zhu

Uber

Nan Zhu is a software engineer at Uber. He works on optimizing Spark for Uber’s scenarios and scaling XGBoost in Uber’s machine learning platform. Nan has been the committee member of XGBoost since 2016. He started the XGBoost4J-Spark project facilitating distributed training in XGBoost and fast histogram algorithms in distributed training.

Photo of Felix Cheung

Felix Cheung

Uber

Felix Cheung is an engineering manager II at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then-state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Anushka Jadhav | sr software engineer
10/09/2019 4:11pm EDT

+1. Can you please add the slides here

Picture of Kaushik Deka
Kaushik Deka | Director, Novantas
09/30/2019 3:19am EDT

Can you please post your presentation slides?

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires