Scaling TensorFlow at LinkedIn
Who is this presentation for?
- Machine learning engineers and machine learning infrastructure engineers
Deep learning has become widespread as frameworks such as TensorFlow have made it easy to onboard machine learning applications. However, while it’s easy to start developing with these frameworks in your local environment with MBs of data, scaling up deep learning models in enterprises is still challenging.
Machine learning has many parts: data ingestion, data preparation, feature extraction, model development, model training, model deployment, and model serving. In order to plug TensorFlow models into the pipeline, you need to fill the gaps between existing machine learning infrastructure and the requirements from deep learning frameworks. Training with TensorFlow also generally requires a larger dataset and advanced computing hardware like GPUs; it’s challenging to scale infrastructure to enable such highly demanding computing scenarios and use resources effectively.
Keqiu Hu, Jonathan Hung, and Abin Shahab explore how the LinkedIn TensorFlow infra team made several critical enhancements to its machine learning pipeline to address TensorFlow’s challenges. Center to these innovations is LinkedIn’s home-brewed open source deep learning platform called TonY. TonY started as a library to natively launch TensorFlow jobs on Apache Hadoop and has evolved into a platform to holistically productionalize deep learning on top of LinkedIn’s existing data ecosystem, with features including TensorFlow history server (with MLflow support), integration with LinkedIn’s Dr. Elephant for jobs tuning, and integration with LinkedIn’s Avro2TF pipeline and Quasar scoring solution.
You’ll discover how LinkedIn integrates TensorFlow into its machine learning pipeline, new Hadoop features useful for deep learning, and take a deep dive into TonY. Keqiu, Jonathan, and Abin share success stories about how LinkedIn has significantly improved existing large-scale machine learning models with TensorFlow in production.
- A basic understanding of TensorFlow and Hadoop
What you'll learn
- Learn how to enable and scale TensorFlow in enterprise
Keqiu Hu is a staff software engineer at LinkedIn, where he’s working on LinkedIn’s big data platforms, primarily focusing on TensorFlow and Hadoop.
Jonathan Hung is a senior software engineer on the Hadoop development team at LinkedIn.
Abin Shahab is a Staff Software Engineer at Linkedin. Since 2014 he has been working on containers and containerizing big data workloads. He’s a contributor to Docker, runc, lxc, cadvisor(part of Kubelet), YARN’s container runtime, and Kubeflow. Currently he’s leading Linkedin’s Deep learning infra team. His other passion is software architectures which was his focus during his graduate studies at Carnegie Mellon University. In his free time(usually after both his daughters are in bed) he reads sci-fi.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires