Anomaly detection using deep learning to measure the quality of large datasets
Who is this presentation for?
- Data engineers, scientists, and technical managers
Level
AdvancedDescription
Any business, big or small, depends on analytics, whether the goal is revenue generation, churn reduction, or sales or marketing purposes. No matter the algorithm and the techniques used, the result depends on the accuracy and consistency of the data being processed. Take a look at some techniques used to evaluate the quality of data and the means to detect the anomalies in the data.
Sridhar Alla walks you through deep learning neural networks and various techniques you can use to detect anomalies in data. In order to derive value from data, no matter what kind of ML algorithms and modeling techniques are implemented such as predictive analytics, clustering, Bayesian belief networks, regression models, ultimately the effectiveness of the models depends directly on the features used, which is again dependent on the input data sources consumed for the purpose. To solve this problem, modules were implemented to define the properties of the data being consumed and detect anomalies in the data, report it, and enable the stakeholders to discuss and take corrective action.
Sridhar showcases how using NVIDIA GPUs, Keras, and TensorFlow using Python 3.6 has pushed the limits on the amount of data that can be profiled and anomalies detected. Similar techniques were implemented on time series data, particularly using LSTM. You’ll learn about deep learning-based autoencoders, unsupervised clustering, and density-based methods. Sridhar shows some code using a Jupyter notebook to show you how you can implement a similar strategy in you organization.
Prerequisite knowledge
- Familiarity with machine learning and Python
What you'll learn
- Learn about the application of deep learning to the problem of ensuring data quality in a data processing and modeling pipeline
Sridhar Alla
BlueWhale
Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance, with solutions such as predictive modeling and analytics, capacity planning, forecasting, anomaly detection, advanced NLP, chatbot development, SAS to Python migration, and deep learning-based model building and operationalization. Sridhar is also the author of three books and an avid presenter at conferences including Strata, Hadoop World, Spark Summit and others.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsor
Exabyte Sponsor
Impact Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
aisponsorships@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
for slides and notebooks, https://github.com/blue-whale-one/strataAICon2019