Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Deep learning beyond the learning

Tobias Knaup (Mesosphere), Joerg Schad (ArangoDB)
11:50am12:30pm Wednesday, March 27, 2019
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data scientists, developers, and architects interested in building deep learning pipelines

Level

Intermediate

Prerequisite knowledge

  • Familiarity with data science tools and concepts (useful but not required)

What you'll learn

  • Understand the challenges involved in implementing a deep learning pipeline
  • Explore a potential end-to-end implementation

Description

Open source frameworks such as Spark, TensorFlow, MXNet, and PyTorch enable anyone to model and train deep learning models. While there are many great tutorials and talks showing the best way to train models, there’s little information on what happens before and after training your model—in other words, how to develop, store, utilize, test, and refine it.

Tobias Knaup and Joerg Schad offer an introduction to building a complete automated deep learning pipeline, starting with exploratory analysis, overtraining, model storage, model serving, and monitoring.

Topics include:

  • How to enable data scientists to develop models without having to worry about the underlying infrastructure
  • How to automatize distributed training, model optimization, and serving using CI/CD
  • How to easily deploy these distributed deep learning frameworks on any public or private infrastructure
  • How to manage multiple different deep learning frameworks on a single cluster, especially considering heterogeneous resources such as GPUs
  • The best interface to use when working with the cluster
  • How to store and serve models at scale
  • How to update models that are currently in use without causing downtime for the services using them
  • How to monitor the entire pipeline and track performance of the deployed models
Photo of Tobias Knaup

Tobias Knaup

Mesosphere

Tobi Knaup is the cofounder and CTO at Mesosphere, a hybrid cloud platform company that helps companies such as NBCUniversal, Deutsche Telekom, and Royal Caribbean adopt transformative technologies like machine learning and real-time analytics with ease. He was one of the first engineers and tech lead at Airbnb, where he wrote large parts of the company’s infrastructure, including its search and fraud prediction services, and helped scale the site to millions of users and build a world-class engineering team. Tobi is the main author of Marathon, Mesosphere’s container orchestrator.

Photo of Joerg Schad

Joerg Schad

ArangoDB

Jörg Schad is Head of Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.