Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Unraveling data with Spark using machine learning

Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera)
13:3017:00 Tuesday, 23 May 2017
Spark & beyond
Location: Capital Suite 2/3
Level: Intermediate
Average rating: ***..
(3.50, 4 ratings)

Who is this presentation for?

  • Data scientists and analysts, programmers, and software engineers

Prerequisite knowledge

  • Experience with machine learning, Scala, Java, and Python

Materials or downloads needed in advance

  • A laptop with Java, Scala, and Spark installed and configured
  • A GitHub account

What you'll learn

  • Understand the different kinds of datasets and learn approaches for applying machine-learning algorithms to them


Data analysis has come a long way in terms of both the size and the complexity of data. Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches to unravelling the underlying patterns in the data leveraging on Spark, machine learning, and related technologies, helping you deal with the noise of the real world and get the maximum value from your data.

Topics include:

  • Clustering
  • Classification
  • Neural networks
Photo of Jeffrey Shmain

Jeffrey Shmain


Jeff Shmain is a principal solutions architect at Cloudera. He has 16+ years of financial industry experience with a strong understanding of security trading, risk, and regulations. Over the last few years, Jeff has worked on various use-case implementations at 8 out of 10 of the world’s largest investment banks.

Photo of Jayant Shekhar

Jayant Shekhar

Sparkflows Inc.

Jayant Shekhar is the founder of Sparkflows Inc., which enables machine learning on large datasets using Spark ML and intelligent workflows. Jayant focuses on Spark, streaming, and machine learning and is a contributor to Spark. Previously, Jayant was a principal solutions architect at Cloudera working with companies both large and small in various verticals on big data use cases, architecture, algorithms, and deployments. Prior to Cloudera, Jayant worked at Yahoo, where he was instrumental in building out the large-scale content/listings platform using Hadoop and big data technologies. Jayant also worked at eBay, building out a new shopping platform, K2, using Nutch and Hadoop among others, as well as KLA-Tencor, building software for reticle inspection stations and defect analysis systems. Jayant holds a bachelor’s degree in computer science from IIT Kharagpur and a master’s degree in computer engineering from San Jose State University.

Photo of Vartika Singh

Vartika Singh


Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.

Comments on this page are now closed.


Picture of Vartika Singh
25/04/2017 19:02 BST

Hello Claudia,

We will use primarily Scala as our hands-on language. Given the code we are using, it should not be hard for you translate the knowledge to Python.

Picture of Claudia Burgard
Claudia Burgard | DATA SCIENTIST
25/04/2017 17:30 BST

as I read under “prerequisites”, “Experience with machine learning, Scala, Java, and Python” will be expected – what language will be the focus in this course? As I do not know any Java or Scala but only Python..