October 28–31, 2019
Please log in

Scaling TensorFlow at LinkedIn

Keqiu Hu (LinkedIn), Jonathan Hung (LinkedIn), Abin Shahab (Linkedin)
4:10pm4:50pm Thursday, October 31, 2019
Location: Grand Ballroom E

Who is this presentation for?

  • Machine learning engineers and machine learning infrastructure engineers




Deep learning has become widespread as frameworks such as TensorFlow have made it easy to onboard machine learning applications. However, while it’s easy to start developing with these frameworks in your local environment with MBs of data, scaling up deep learning models in enterprises is still challenging.

Machine learning has many parts: data ingestion, data preparation, feature extraction, model development, model training, model deployment, and model serving. In order to plug TensorFlow models into the pipeline, you need to fill the gaps between existing machine learning infrastructure and the requirements from deep learning frameworks. Training with TensorFlow also generally requires a larger dataset and advanced computing hardware like GPUs; it’s challenging to scale infrastructure to enable such highly demanding computing scenarios and use resources effectively.

Keqiu Hu, Jonathan Hung, and Abin Shahab explore how the LinkedIn TensorFlow infra team made several critical enhancements to its machine learning pipeline to address TensorFlow’s challenges. Center to these innovations is LinkedIn’s home-brewed open source deep learning platform called TonY. TonY started as a library to natively launch TensorFlow jobs on Apache Hadoop and has evolved into a platform to holistically productionalize deep learning on top of LinkedIn’s existing data ecosystem, with features including TensorFlow history server (with MLflow support), integration with LinkedIn’s Dr. Elephant for jobs tuning, and integration with LinkedIn’s Avro2TF pipeline and Quasar scoring solution.

You’ll discover how LinkedIn integrates TensorFlow into its machine learning pipeline, new Hadoop features useful for deep learning, and take a deep dive into TonY. Keqiu, Jonathan, and Abin share success stories about how LinkedIn has significantly improved existing large-scale machine learning models with TensorFlow in production.

Prerequisite knowledge

  • A basic understanding of TensorFlow and Hadoop

What you'll learn

  • Learn how to enable and scale TensorFlow in enterprise
Photo of Keqiu Hu

Keqiu Hu


Keqiu Hu is a staff software engineer at LinkedIn, where he’s working on LinkedIn’s big data platforms, primarily focusing on TensorFlow and Hadoop.

Photo of Jonathan Hung

Jonathan Hung


Jonathan Hung is a senior software engineer on the Hadoop development team at LinkedIn.

Abin Shahab


Abin Shahab is a Staff Software Engineer at Linkedin. Since 2014 he has been working on containers and containerizing big data workloads. He’s a contributor to Docker, runc, lxc, cadvisor(part of Kubelet), YARN’s container runtime, and Kubeflow. Currently he’s leading Linkedin’s Deep learning infra team. His other passion is software architectures which was his focus during his graduate studies at Carnegie Mellon University. In his free time(usually after both his daughters are in bed) he reads sci-fi.

  • O'Reilly
  • TensorFlow
  • Google Cloud
  • IBM
  • Databricks
  • Tensor Networks
  • VMware
  • Amazon Web Services
  • One Convergence
  • Quantiphi
  • Lambda Labs
  • Tech Mahindra
  • cnvrg.io
  • Determined AI
  • Inferencery
  • Manceps, Inc.
  • PerceptiLabs
  • Valohai

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires