Anomaly detection using deep learning to measure the quality of large datasets
Who is this presentation for?
- Data engineers, scientists, and technical managers
Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Take a look at some techniques used to evaluate the quality of data and the means to detect the anomalies in the data.
Sridhar Alla walks you through deep learning neural networks and various techniques you can use to detect anomalies in data. In order to derive value from data, no matter what kind of ML algorithms and modeling techniques are implemented such as predictive analytics, clustering, Bayesian belief networks, regression models, ultimately the effectiveness of the models depends directly on the features used, which is again dependent on the input data sources consumed for the purpose. To solve this problem, modules were implemented to define the properties of the data being consumed and detect anomalies in the data, report it, and enable the stakeholders to discuss and take corrective action.
Sridhar showcases how using NVIDIA GPUs, Keras, and TensorFlow using Python 3.6 has pushed the limits on the amount of data that can be profiled and anomalies detected. Similar techniques were implemented on time series data, particularly using LSTM. You’ll learn about deep learning-based autoencoders, unsupervised clustering, and density-based methods. Sridhar shows some code using a Jupyter notebook to show you how you can implement a similar strategy in you organization.
- Familiarity with machine learning and Python
What you'll learn
- Learn about the application of deep learning to the problem of ensuring data quality in a data processing and modeling pipeline
Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance, with solutions such as predictive modeling and analytics, capacity planning, forecasting, anomaly detection, advanced NLP, chatbot development, SAS to Python migration, and deep learning-based model building and operationalization. Sridhar is also the author of three books and an avid presenter at conferences including Strata, Hadoop World, Spark Summit and others.
Comments on this page are now closed.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires
for slides and notebooks, https://github.com/blue-whale-one/strataAICon2019