Airbnb’s data-driven products present a wide variety of unique ML problems, ranging from traditional models built on structured data to state-of-the-art models that leverage unstructured data, such as user reviews, messages, and images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success.
An end-to-end solution typically needs to cover data collection, feature engineering, training, deploying, serving, and monitoring. Presently, few platforms are capable of doing all of the above in a user-friendly way. Moreover, the heterogeneous nature of ML problems and the requirement of scalability pose challenges to fast iteration and productionization.
Atul Kale and Xiaohan Zeng offer an overview of Bighead, Airbnb’s user-friendly and scalable end-to-end machine learning framework that powers Airbnb’s data-driven products. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment.
Atul and Xiaohan explore Bighead’s architecture, detail the problems that each individual component and the overall system aim to solve, and outline a vision for the future of machine learning infrastructure. Bighead is widely adopted at Airbnb, with a variety of models in production, and has enabled the company to reduce model development time from months to days. Airbnb plans to open source Bighead to allow the broader community to benefit from this work.
Atul Kale is a software engineer on Airbnb’s machine learning infrastructure team. Previously, Atul worked in finance building machine learning-driven proprietary trading strategies and the data pipelines to support them. He holds a degree in computer engineering from the University of Illinois Urbana-Champaign.
Xiaohan Zeng is a software engineer on the machine learning infrastructure team at Airbnb. Previously, he worked on the machine learning platform team at Groupon. He holds a degree in chemical engineering from Tsinghua University and Northwestern University but started to pursue a career in software engineering and machine learning after doing research in data science. Outside work, he enjoys reading, writing, traveling, movies, and trying to follow his daughter around when she suddenly decides to practice walking.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Can you share the slides ?