We run, we improve, we scale - XGBoost story in Uber
Who is this presentation for?Machine Learning Engineers, Data Scientists
Prerequisite knowledgebeginner understanding of tree machine learning model What is XGBoost Brief experience with XGBoost
What you'll learn
With the tremendous growing of Uber’s business scale, the agility and scalability of the machine learning system is the core prerequisite in making data-driven decisions to improve our user experiences.
With a good fitting to our requirements, XGBoost is playing roles across our business scope. XGBoost not only produces accurate models, but also it scales to handle billions of records and thousands of features. We have XGBoost models improving the driver’s safety during driving, recommending foods and restaurants and estimating the arrival time of rides, etc.
This talk, given by XGBoost team at Uber and committee member of the Open Source XGBoost, is to give insights about
(1) the internals on how XGBoost scales training to hundreds even thousands of workers with the accuracy guarantee. It’s the first time for the community core member to bring detailed internals of distributed training to the public audience.
(2) Uber’s journey with the latest version of XGBoost. We will talk about the problems we met with the earlier version of XGBoost, how we identify, fix and eventually unblock ourselves by improving XGBoost and contribute back to the community. Finally, we will summarize the lessons we learnt and our future plan with XGBoost.
Nan Zhu is a software engineer in Uber. He works on optimizing Apache Spark for Uber scenarios and scaling XGBoost in the machine learning platform of Uber. Nan has been the committee member of XGBoost since 2016. He started project XGBoost4J-Spark integrating XGBoost and Spark as well as fast histogram algorithm in distributed training.
Felix Cheung is an engineer at Uber and a PMC and committer for Apache Spark. Felix started his journey in the big data space about five years ago with the then state-of-the-art MapReduce. Since then, he’s (re-)built Hadoop clusters from metal more times than he would like, created a Hadoop distro from two dozen or so projects, and juggled hundreds to thousands of cores in the cloud or in data centers. He built a few interesting apps with Apache Spark and ended up contributing to the project. In addition to building stuff, he frequently presents at conferences, meetups, and workshops. He was also a teaching assistant for the first set of edX MOOCs on Apache Spark.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts