Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Unraveling data with Spark using machine learning

Vartika Singh (Cloudera), Jayant Shekhar (Sparkflows Inc.), Jeffrey Shmain (Cloudera)
9:00am12:30pm Tuesday, March 14, 2017
Spark & beyond
Location: 210 B/F Level: Intermediate
Average rating: ***..
(3.83, 6 ratings)

Who is this presentation for?

  • Data scientists and analysts, programmers, and software engineers

Prerequisite knowledge

  • Experience with machine learning, Scala, Java, and Python

Materials or downloads needed in advance

  • A laptop with Java, Scala, and Spark installed and configured
  • A GitHub account
  • If you would like to also run the source code provided, the install instructions and code are at To help protect the show bandwidth, please make sure to download the materials before you arrive onsite.

What you'll learn

  • Understand the different kinds of datasets and learn approaches for applying machine-learning algorithms to them


Data analysis has come a long way in terms of both the size and the complexity of data. Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches to unraveling the underlying patterns in data leveraging Spark, machine learning, and related technologies, helping you deal with the noise of the real world and get the maximum value from your data.

Topics include:

  • Clustering
  • Classification
  • Neural networks
Photo of Vartika Singh

Vartika Singh


Vartika Singh is a solutions architect at Cloudera with over 12 years of experience applying machine learning techniques to big data problems.

Photo of Jayant Shekhar

Jayant Shekhar

Sparkflows Inc.

Jayant Shekhar is the founder of Sparkflows Inc., which enables machine learning on large datasets using Spark ML and intelligent workflows. Jayant focuses on Spark, streaming, and machine learning and is a contributor to Spark. Previously, Jayant was a principal solutions architect at Cloudera working with companies both large and small in various verticals on big data use cases, architecture, algorithms, and deployments. Prior to Cloudera, Jayant worked at Yahoo, where he was instrumental in building out the large-scale content/listings platform using Hadoop and big data technologies. Jayant also worked at eBay, building out a new shopping platform, K2, using Nutch and Hadoop among others, as well as KLA-Tencor, building software for reticle inspection stations and defect analysis systems. Jayant holds a bachelor’s degree in computer science from IIT Kharagpur and a master’s degree in computer engineering from San Jose State University.

Photo of Jeffrey Shmain

Jeffrey Shmain


Jeff Shmain is a principal solutions architect at Cloudera. He has 16+ years of financial industry experience with a strong understanding of security trading, risk, and regulations. Over the last few years, Jeff has worked on various use-case implementations at 8 out of 10 of the world’s largest investment banks.