Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference

Unraveling data with Spark using deep learning and other algorithms from machine learning

Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
1:30pm5:00pm Tuesday, December 5, 2017
Machine Learning, Spark and beyond
Location: 321/322

Who is this presentation for?

  • Data scientists and analysts, programmers, and software engineers

Prerequisite knowledge

  • Experience with machine learning, Scala, Java, and Python

Materials or downloads needed in advance

  • A laptop

What you'll learn

  • Learn approaches for applying machine learning and deep learning algorithms that leverage Spark and other open source libraries


Data analysis has come a long way in terms of dealing with both the size and the complexity of the data itself. Vartika Singh and Jeffrey Shmain walk you through various approaches to unraveling the underlying patterns in the data leveraging Spark, machine learning, and related technologies. Along the way, Vartika and Jeff discuss common issues encountered as the data and model sizes grow and demonstrate how to solve analytical problems using deep learning frameworks Caffe and TensorFlow on a Spark cluster.

Topics include:

  • Data preprocessing
  • Clustering
  • Classification
  • Deep learning
Photo of Vartika Singh

Vartika Singh


Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.

Photo of Jeffrey Shmain

Jeffrey Shmain


Jeff Shmain is a principal solutions architect at Cloudera. He has 16+ years of financial industry experience with a strong understanding of security trading, risk, and regulations. Over the last few years, Jeff has worked on various use-case implementations at 8 out of 10 of the world’s largest investment banks.