Mar 15–18, 2020

Scalable and automated pipeline for large-scale neural network training and inference

Ebrahim Safavi (Mist Systems), Jisheng Wang (Mist Systems)
4:15pm4:55pm Tuesday, March 17, 2020
Location: LL20D

Who is this presentation for?

Data scientists or analysts

Level

Beginner

Description

Anomaly detection models are essential to run data-driven businesses intelligently. In order to manage tens of thousands of anomaly detection models at Mist Systems, it built a cloud native and scalable ML training pipeline, which automates all steps of ML operations, including data collection, model training, model validation, model deployment, and version control. The inference workflow is decoupled from the training process to increase the agility and minimize the delay of model service.

Ebrahim Safavi and Jisheng Wang detail the unsupervised confident deep multivariate models Mist Systems built to automatically detect WiFi network issues. They dive deeper into the details of its cloud-based pipeline and how it uses relative entropy to automate the training workflow. And you’ll learn how to productize and monitor thousands of ML models to automate anomaly detection.

Motivated by the recent impressive performance of recurrent neural networks (RNNs) on a wide spectrum of tasks, Mist Systems developed confident deep bidirectional long short-term memory (BiLSTM) models that leverage a large amount of data across numerous dimensions to capture trends and catch anomalies across thousands of WiFi networks and address issues in real time. The proposed BiLSTM models are capable of predicting the uncertainty of their detection which is essential for anomaly detection.

In addition, to address the challenges imposed by the stochastic nature of unsupervised anomaly detection on the workflow pipeline, the company developed novel statistical models for the training workflow to leverage historical data and automate model validation, deployment, and version control.

The anomaly detection service happens hourly and the training jobs occurs weekly through the pipeline, which consists of different steps including managing the training and serving data stream, model versioning for predictions, training, and serving for each network’s model. The workflow pipeline uses different technologies, including Secor service, Amazon S3 service, Apache Spark across Amazon EMR cluster, Apache Kafka, and Elasticsearch.

Prerequisite knowledge

  • Familiarity with big data technologies (useful but not required)

What you'll learn

  • Learn how to develop high-accuracy multivariate anomaly detection using RNN
  • Discover how to build and automate scalable cloud native ML pipelines
Photo of Ebrahim Safavi

Ebrahim Safavi

Mist Systems

Ebrahim Safavi is a senior data scientist at Juniper, focusing on knowledge discovery from big data using machine learning and large-scale data mining where he developed, and implemented several key production components including company’s chatbot inference engine and anomaly detections. He won a Microsoft research award for his work on information retrieval and recommendation systems in graph-structured networks. Ebrahim earned a PhD degree in cognitive learning networks from Stevens Institute of Technology.

Photo of Jisheng Wang

Jisheng Wang

Mist Systems

Jisheng Wang is the head of data science at Mist Systems, where he leads the development of Marvis—the first AI-driven virtual network assistant that automates the visibility, troubleshooting, reporting, and maintenance of enterprise networking. He has 10+ years of experience applying state-of-the-art big data and data science technologies to solve challenging enterprise problems including security, networking, and IoT. Previously, Jisheng was the senior director of data science in the CTO office of Aruba, a Hewlett-Packard Enterprise company since its acquisition of Niara in February 2017, where he led the overall innovation and development effort in big data infrastructure and data science and invented the industry’s first modular and data-agonistic User and Entity Behavior Analytics (UEBA) solution, which is widely deployed today among global enterprises; and he was a technical lead in Cisco responsible for various security products. Jisheng earned his PhD in electric engineering from Penn State University. He’s a frequent speaker at AI and ML conferences, including O’Reilly Strata AI, Frontier AI, Spark Summit, Hadoop Summit, and BlackHat.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

Become a sponsor

For information on exhibiting or sponsoring a conference

pr@oreilly.com

For media/analyst press inquires