Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Large-scale ML with MLflow, deep learning, and Apache Spark

Amir Issaei (Databricks)
Monday, 29 April & Tuesday, 30 April, 9:00 - 17:00
Data Science, Machine Learning & AI
Location: Capital Suite 17
Secondary topics:  Deep Learning, Model lifecycle management
Average rating: *****
(5.00, 1 rating)

Participants should plan to attend both days of this 2-day training course. To attend training courses, you must register for a Platinum or Training pass; does not include access to tutorials on Tuesday.

Join Amir Issaei to explore neural network fundamentals and learn how to build distributed Keras/TensorFlow models on top of Spark DataFrames. You'll use Keras, TensorFlow, Deep Learning Pipelines, and Horovod to build and tune models and MLflow to track experiments and manage the machine learning lifecycle. This course is taught entirely in Python.

What you'll learn, and how you can apply it

  • Learn how to build a neural network with Keras
  • Understand the difference between various activation functions and optimizers
  • Discover how to track experiments with MLflow
  • Learn how to apply models at scale with Deep Learning Pipelines
  • Understand how to build distributed TensorFlow models with Horovod

    This training is for you because...

    • You're a practicing data scientist who's eager to get started with deep learning.
    • You're a software engineer or technical manager interested in a thorough, hands-on overview of deep learning and its integration with Apache Spark.

    Prerequisites:

    • A working knowledge of Python (NumPy and pandas) and Spark DataFrames
    • Familiarity with data science

      Hardware and/or installation requirements:

      • A WiFi-enabled laptop with the Chrome (preferred) or Firefox web browser installed
      • The ability to access the following domains: databricks.com; keras.io; and spark.apache.org

      Outline

      Intro to neural networks with Keras I

      • Neural network architecture

      • Batch sizes and epochs

      • Evaluation metrics

      • Keras API

      Intro to neural networks with Keras II

      • Activation functions

      • Data normalization

      • Optimizers

      • Custom metrics

      • Validation dataset

      • Callbacks/checkpointing

      MLflow

      • Experiment tracking

      • Record which model and hyperparameters performed best

      Convolutional neural networks

      • Working with image data

      • Convolutions

      • Max pooling versus average pooling

      • ImageNet architectures

      • Deep Learning Pipelines: Apply pretrained models in parallel

      Horovod

      • Distributed Keras/TensorFlow model training

      • Allreduce technique

      • Combine Spark preprocessing with distributed neural network training

      About your instructor

      Photo of Amir Issaei

      Amir Issaei is a data science consultant at Databricks, where he educates customers on how to leverage the company’s Unified Analytics Platform in machine learning (ML) projects. He also helps customers implement ML solutions and use advanced analytics to solve business problems. Previously, he worked in the Operations Research Department at American Airlines, where he supported the Customer Planning, Airport, and Customer Analytics Groups. He holds an MS in mathematics from the University of Waterloo and a BE in physics from the University of British Columbia.

      Conference registration

      Get the Platinum pass or the Training pass to add this course to your package.