Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Anomaly detection using deep learning to measure the quality of large datasets​

Sridhar Alla (BlueWhale), Syed Nasar (Cloudera)
2:40pm3:20pm Thursday, March 28, 2019
Average rating: **...
(2.86, 7 ratings)

Who is this presentation for?

  • Data engineers, scientists, and technical managers



Prerequisite knowledge

  • Familiarity with machine learning and Python

What you'll learn

  • Learn how deep learning can be applied to ensure data quality in a data processing/modeling pipeline


In order to derive value from data, the effectiveness of your models—whether predictive analytics, clustering, Bayesian belief networks, or regression models—depends directly on the features used, which is itself dependent on the input data sources consumed for the purpose. To solve this problem, Sridhar Alla and Syed Nasar implemented modules to define the properties of the data being consumed, detect anomalies in the data, and report it, enabling stakeholders to take corrective action.

Sridhar and Syed explore deep learning neural networks and share techniques you can use to detect anomalies in data. Specifically, they demonstrate how to use NVIDIA GPUs , Keras, and TensorFlow with Python 3.6 to push the limits on the amount of data that can be profiled and anomalies detected and explain how they used similar techniques on time series data, using LSTM. You’ll also learn about deep learning-based autoencoders, unsupervised clustering, and density-based methods.

Code will be shared in a Jupyter notebook to help you implement a similar strategy in your organization.

Photo of Sridhar Alla

Sridhar Alla


Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance, with solutions such as predictive modeling and analytics, capacity planning, forecasting, anomaly detection, advanced NLP, chatbot development, SAS to Python migration, and deep learning-based model building and operationalization. Sridhar is also the author of three books and an avid presenter at conferences including Strata, Hadoop World, Spark Summit and others.

Photo of Syed Nasar

Syed Nasar


Syed Nasar is a solutions architect at Cloudera. As a big data and machine learning professional, his expertise extends to artificial intelligence, machine learning, and computer vision, and he has worked with a number of enterprises in bridging big data technologies with advanced statistical analysis, machine learning, and deep learning to create high-quality data products and intelligent systems that drive strategy and investment decisions. Syed is a founder of the Nashville Artificial Intelligence Society. His research interests include NLP, deep learning (mainly RNN and GAN), distributed systems, machine learning at scale, and emerging technologies. He is the founder of Nashville Artificial Intelligence Society. He holds a master’s degree in interactive intelligence from the Georgia Institute of Technology.

Comments on this page are now closed.


priyanka naikade |
04/21/2019 4:26pm PDT

Thank you very much!

Picture of Syed Nasar
04/20/2019 7:59pm PDT

The presentation is available here:

priyanka naikade |
04/20/2019 5:27pm PDT

Is there any way I can get access to this presentation or tutorial video? Could you please share the link of this presentation? I’m a student at Western University, London Ontario. I’m keen on exploring about this topic.