14–17 Oct 2019

Using the Azure Cloud to Scale Up Hyperparameter Optimization for Machine Learning

Vanja Paunic (Microsoft)

Hyperparameter optimization for Machine Leaning (ML) models is a complex task that combines multiple training sessions with searching through a high dimensional space of parameters. Despite its complexity and ML algorithm dependency, hyperparameter optimization can be conceptualized as a meta task that can be decoupled from the training sessions, and can thus be implemented as a generic framework that focuses on sampling the parameter search space while trying to optimize a user defined metric. This approach allows hyperparameter optimization to be used as a generic process automation applicable to any machine learning algorithm, which frees the data scientist from time consuming jobs like writing optimization code and managing repeatable data processing pipelines.

Here we show how hyperparameter optimization can be performed in a transparent, scalable, and easy to manage way using the Azure Hyperdrive service. We focus on object detection and text matching, two common machine learning scenarios for image and natural language processing (NLP) that are implemented here using open source frameworks. For object detection we use Faster R-CNN algorithm 1 which is a state-of-the-art deep-learning algorithm for that task. The algorithm has two implementations based on TensorFlow Object Detection API [2,3] and torchvision 4, two open source frameworks that are de-facto standards for constructing, and training object detection models. The text matching task includes text data featurization and is implemented via a Scikit-Learn pipeline.

All three implementations are leveraging Azure HyperDrive 5 for hyperparameter optimization using the Azure Machine Learning (AzureML) Python SDK 6. Key points are decoupling AzureML dependencies via Conda environment files, constructing implementation specific but platform agnostic Docker files and corresponding Docker images for easy reproducibility, containerized data preprocessing using elastically allocated AzureML compute targets, and finally using AzureML HyperDrive for hyperparameter tuning while training the deep learning or NLP models. The end to end process is completed by showing how the tuned models are deployed at scale using Azure Kubernetes clusters by using a Jupyter notebook widget client that consumes the models.

The above steps provide an easily reproducible recipe for building custom object detection or NLP models. Via transfer learning, the two implementations for object detection can be extended to solve a large class of object detection problems by leveraging publicly available pre-trained models that are further refined by training on smaller size custom datasets to solve customer specific problems related to identifying multiple objects in images. For the text matching example, main steps like questions selection, labeling, data featurization are reusable for other similar NLP and generic ML problems. The key point of this presentation is showing how HyperDrive can be used in an auto-scalable, generic out-of-the-box fashion to automatically optimize the hyperparameter settings for heterogenous and completely independent groups of applications by employing advanced features of the hyperparameter optimization framework, including early termination policies and random, grid or Bayesian optimization.

This work is published 7 as AI Azure Global Reference Architectures with complete source code available in public git repositories.

1. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, Shaoqing Ren, Kaiming He, Ross B. Girshick, Jian Sun, IEEE Transactions on Pattern Analysis and Machine Intelligence 2015

1. K. He, G. Gkioxari, P. Dollar, and R. Girshick. Mask R-CNN. ´ arXiv:1703.06870, 20172. “Speed/accuracy trade-offs for modern convolutional object detectors.” Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K, CVPR 2017

3. https://github.com/tensorflow/models/tree/master/research/object_detectionppy

4. https://github.com/pytorch/vision

5. https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py

6. https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py

7. https://docs.microsoft.com/en-us/azure/architecture/reference-architectures

Photo of Vanja Paunic

Vanja Paunic


Vanja Paunic is a data scientist in the Algorithms and Data Science Group at Microsoft London. She works on building machine learning solutions with external companies utilizing Microsoft’s AI Cloud Platform. She holds a PhD in computer science with a focus on data mining in the biomedical domain from the University of Minnesota.

  • Intel AI
  • O'Reilly
  • Amazon Web Services
  • IBM Watson
  • Dell Technologies
  • Hewlett Packard Enterprise
  • AXA

Contact us


For conference registration information and customer service


For more information on community discounts and trade opportunities with O’Reilly conferences


For information on exhibiting or sponsoring a conference


For media/analyst press inquires