Scaling TensorFlow at LinkedIn
Who is this presentation for?
- Machine learning engineers and machine learning infrastructure engineers
Level
Description
Deep learning has become widespread as frameworks such as TensorFlow have made it easy to onboard machine learning applications. However, while it’s easy to start developing with these frameworks in your local environment with MBs of data, scaling up deep learning models in enterprises is still challenging.
Machine learning has many parts: data ingestion, data preparation, feature extraction, model development, model training, model deployment, and model serving. In order to plug TensorFlow models into the pipeline, you need to fill the gaps between existing machine learning infrastructure and the requirements from deep learning frameworks. Training with TensorFlow also generally requires a larger dataset and advanced computing hardware like GPUs; it’s challenging to scale infrastructure to enable such highly demanding computing scenarios and use resources effectively.
Keqiu Hu, Jonathan Hung, and Abin Shahab explore how the LinkedIn TensorFlow infra team made several critical enhancements to its machine learning pipeline to address TensorFlow’s challenges. Center to these innovations is LinkedIn’s home-brewed open source deep learning platform called TonY. TonY started as a library to natively launch TensorFlow jobs on Apache Hadoop and has evolved into a platform to holistically productionalize deep learning on top of LinkedIn’s existing data ecosystem, with features including TensorFlow history server (with MLflow support), integration with LinkedIn’s Dr. Elephant for jobs tuning, and integration with LinkedIn’s Avro2TF pipeline and Quasar scoring solution.
You’ll discover how LinkedIn integrates TensorFlow into its machine learning pipeline, new Hadoop features useful for deep learning, and take a deep dive into TonY. Keqiu, Jonathan, and Abin share success stories about how LinkedIn has significantly improved existing large-scale machine learning models with TensorFlow in production.
Prerequisite knowledge
- A basic understanding of TensorFlow and Hadoop
What you'll learn
- Learn how to enable and scale TensorFlow in enterprise
Keqiu Hu
Keqiu Hu is a staff software engineer at LinkedIn, where he’s working on LinkedIn’s big data platforms, primarily focusing on TensorFlow and Hadoop.
Jonathan Hung
Jonathan Hung is a senior software engineer on the Hadoop development team at LinkedIn.
Abin Shahab
Abin Shahab is a Staff Software Engineer at Linkedin. Since 2014 he has been working on containers and containerizing big data workloads. He’s a contributor to Docker, runc, lxc, cadvisor(part of Kubelet), YARN’s container runtime, and Kubeflow. Currently he’s leading Linkedin’s Deep learning infra team. His other passion is software architectures which was his focus during his graduate studies at Carnegie Mellon University. In his free time(usually after both his daughters are in bed) he reads sci-fi.
Presented by
Diamond Sponsor
Elite Sponsors
Gold Sponsor
Supporting Sponsors
Premier Exhibitors
Exhibitors
Innovators
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
sponsorships@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires