Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Unraveling data with Spark using deep learning and other algorithms from machine learning

Vartika Singh (Cloudera), Jeffrey Shmain (Cloudera)
9:00am12:30pm Tuesday, September 26, 2017
Machine Learning & Data Science, Spark & beyond
Location: 1A 12/14 Level: Intermediate
Secondary topics:  Deep learning
Average rating: **...
(2.50, 6 ratings)

Who is this presentation for?

  • Data scientists and analysts, programmers, and software engineers

Prerequisite knowledge

  • Experience with machine learning, Scala, Java, and Python

Materials or downloads needed in advance

  • A laptop with Java, Scala, and Spark installed and configured
  • A GitHub account

What you'll learn

  • Learn approaches for applying machine learning and deep learning algorithms that leverage Spark and other open source libraries


Data analysis has come a long way in terms of dealing with both the size and the complexity of the data itself. Vartika Singh and Jeffrey Shmain walk you through various approaches to unraveling the underlying patterns in the data leveraging Spark, machine learning, and related Along the way, Vartika and Jeff discuss common issues encountered as the data and model sizes grow and demonstrate how to solve analytical problems using deep learning frameworks Caffe and TensorFlow on a Spark cluster.

Topics include:

  • Clustering
  • Classification
  • Deep learning
Photo of Vartika Singh

Vartika Singh


Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.

Photo of Jeffrey Shmain

Jeffrey Shmain


Jeff Shmain is a principal solutions architect at Cloudera. He has 16+ years of financial industry experience with a strong understanding of security trading, risk, and regulations. Over the last few years, Jeff has worked on various use-case implementations at 8 out of 10 of the world’s largest investment banks.

Comments on this page are now closed.


Picture of Mohammed Ayub
Mohammed Ayub | DATA SCIENTIST
09/26/2017 5:28am EDT

Thanks, Jeffrey !

Picture of Jeffrey Shmain
09/25/2017 7:41pm EDT

This tutorial will mostly be done through Cloudera Data Science Workbench. So minimal setup is required to run the examples. However, all of the code and examples are on github and could potentially be run in 3rd party tools.

Picture of Mohammed Ayub
Mohammed Ayub | DATA SCIENTIST
09/25/2017 7:30pm EDT

I have sparkmagic kernel installed for jupyter notebook from here:
Will this work for the tutorial ?

08/10/2017 6:10am EDT

How much is the hands on programing in this training