Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

LSTM-based time series anomaly detection using Analytics Zoo for Spark and BigDL

Guoqiong Song (Intel)
16:3517:15 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 17
Average rating: ***..
(3.40, 5 ratings)

Who is this presentation for?

  • Machine and deep learning practitioners and big data professionals



Prerequisite knowledge

  • A basic understanding of Apache Spark, machine learning, and deep learning

What you'll learn

  • Learn how to use Analytics Zoo on BigDL and Apache Spark and apply DL techniques to solve real-world use cases like anomaly detection and fraud detection


Collecting and processing massive time series data (e.g., logs, sensor readings, etc.) and detecting the anomalies in real time is critical for many emerging smart systems, such as industrial, manufacturing, AIOps, and the IoT.

Long short-term memory networks (LSTMs) have proven to be an effective technology on a variety of time series analysis tasks. They capture temporal information by learning the dynamics of sequences via cycles in the network of nodes. LSTMs can be readily built using any of today’s deep learning packages. However, most popular deep learning libraries use Python as their native language and run on GPU clusters to achieve state-of-the-art performance, which presents a real challenge in the productionization environment.

Guoqiong Song explains how to apply time series anomaly detection for big data at scale, using the end-to-end Spark and BigDL pipeline provided by Analytics Zoo.You’ll learn how to build the end-to-end flow on standard Hadoop/Spark clusters, including preprocessing the raw time series data and extracting features, then train an anomaly detector model based on LSTMs and evaluate the model and anomaly detection. This solution has been applied at Yunda, Travelsky, and Baosight, among others

Photo of Guoqiong Song

Guoqiong Song


Guoqiong Song is a senior deep learning software engineer on the big data technology team at Intel. She’s interested in developing and optimizing distributed deep learning algorithms on Spark. She holds a PhD in atmospheric and oceanic sciences with a focus on numerical modeling and optimization from UCLA.

Guoqiong Song是英特尔大数据技术团队的高级深度学习软件工程师。 她拥有加州大学洛杉矶分校的大气和海洋科学博士学位,专业方向是数值建模和优化。 她现在的研究兴趣是开发和优化分布式深度学习算法。