Presented By
O’Reilly + Intel AI
Put AI to Work
April 15-18, 2019
New York, NY

Building a production-scale ML platform

YU DONG (Facebook Inc)
1:50pm2:30pm Wednesday, April 17, 2019
Implementing AI
Location: Rendezvous
Secondary topics:  Media, Marketing, Advertising, Platforms and infrastructure

Who is this presentation for?

ML engineers, Data scientists, Product managers, research scientists



Prerequisite knowledge

Basic understanding of ML is enough.

What you'll learn

1. Understand the main motivation and challenges of building a production-scale platform; 2. Understand the representative use cases of production-scale ML platform in our products/services; 3. Understand the key ML workflow, components and proposed innovative approaches to achieve production-scale performance of a ML platform


Thru this presentation, I want to describe why, what & how of building a production-scale ML platform based on ongoing ML research trends and industry adoptions.

1. Democratized AI: On Aug 2018, Gartner said that democratized AI will be one of the major trends which will shape our future technologies. The research is based on the so-called “Hype Cycle”(see image below), which comprises of insights from over 2,000 technologies into 35 main areas of interest and trends, with a particular focus on innovations which could give businesses a future competitive advantage. AI technologies will be “virtually everywhere” over the next 10 years, but it will be open to the masses rather than being purely commercial. Cloud computing, open-source projects, and the “maker” community will mold this trend, eventually “propelling AI into everyone’s hands.” AI-based Platform as a Service (PaaS) solutions, autonomous driving, mobile robots and conversational AI platforms & assistants are expected to become major enterprise technologies in the future.
2. “One size doesn’t fit all”. If you go to a clothes store and ask an employee for recommendations, the answer they’ll give will likely depend on your appearance, gender, and any other information you provide. In most times, people are looking for increasingly personalized products/services when applicable. The same principle applies to the vast majority of artificial intelligence technologies: we want to act differently based on the information we’re given, and a discriminative model might be preferred here instead of a generative one. This “one size doesn’t fit all” trend will lead to popular needs for a production-scale ML platform which can digest tons of raw data from variety of sources and generate or enable personalized models, services and products at scale.

What is Challenging
1. Scalability: The scale factor spans across the whole ML lifecycles from larger datasets to more complex features/models to increasing prediction requests, which brings various scalability challenges to ML platform and underneath infrastructure resources from compute to storage to network.
2. Stability: Obviously stability is a critical factor to any software platform, since you will not have any high expectation of an unstable platform which always fails your request. For ML scenario, ensuring a successful E2E ML workflow becomes a surging challenge due to trends of more complicated models exploration, larger amount of unverified dataset processing and cheaper commodity hardware adoption.
3. Cost-aware: Everyone wants to train a perfect ML model which can serve all requests in an optimal way, but no one can afford it with an infinite training cost, since every company has its own budget, no matter it is an established fortune 500 or a fast growing start-up. The cost here includes not only compute server, storage and power usage but also developer salary and time. Cost-aware ML process is becoming a determining factor of any ML platform’s cost efficiency and economic of scale.
4. Usability: Not every ML platform user is a ML expert. In reality, ~75% future ML developers might just use pre-trained ML models directly or do some simple tuning and deploy in their projects directly based on a recent survey. On the other hand, certain ML researchers and engineers will use the platform to try various experimentations from complicated feature engineering to innovative model arch search. Building a highly usable ML platform to serve different needs of platform users is a non-trivial challenge.

Photo of YU DONG


Facebook Inc

As senior technical product manager @ Facebook, I am working on AI/ML Platform (FBLearner) that enables more personalized and smarter FB products.

I was senior software engineer manager @ Hewlett Packard Enterprise and Cisco System previously. I hold a Ph.D degree of Computer Engineering and MBA from University of California, Berkeley. My passion is to democratize AI across various industries thru building a performant, reliable, efficient, resilient and easy-to-use AI platform.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)