Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Unraveling data with Spark using machine learning

Jeffrey Shmain (Cloudera), Jayant Shekhar (Sparkflows Inc.), Vartika Singh (Cloudera)
13:3017:00 Tuesday, 23 May 2017
Spark & beyond
Location: Capital Suite 2/3
Level: Intermediate
Average rating: ***..
(3.50, 4 ratings)

Who is this presentation for?

  • Data scientists and analysts, programmers, and software engineers

Prerequisite knowledge

  • Experience with machine learning, Scala, Java, and Python

Materials or downloads needed in advance

  • A laptop with Java, Scala, and Spark installed and configured
  • A GitHub account

What you'll learn

  • Understand the different kinds of datasets and learn approaches for applying machine-learning algorithms to them

Description

Data analysis has come a long way in terms of both the size and the complexity of data. Vartika Singh, Jayant Shekhar, and Jeffrey Shmain walk you through various approaches to unravelling the underlying patterns in the data leveraging on Spark, machine learning, and related technologies, helping you deal with the noise of the real world and get the maximum value from your data.

Topics include:

  • Clustering
  • Classification
  • Neural networks
Photo of Jeffrey Shmain

Jeffrey Shmain

Cloudera

Jeff Shmain is a principal solutions architect at Cloudera. He has 16+ years of financial industry experience with a strong understanding of security trading, risk, and regulations. Over the last few years, Jeff has worked on various use-case implementations at 8 out of 10 of the world’s largest investment banks.

Photo of Jayant Shekhar

Jayant Shekhar

Sparkflows Inc.

Jayant Shekhar is the founder of Sparkflows Inc., which enables machine learning on large datasets using Spark ML and intelligent workflows. Jayant focuses on Spark, streaming, and machine learning and is a contributor to Spark. Previously, Jayant was a principal solutions architect at Cloudera working with companies both large and small in various verticals on big data use cases, architecture, algorithms, and deployments. Prior to Cloudera, Jayant worked at Yahoo, where he was instrumental in building out the large-scale content/listings platform using Hadoop and big data technologies. Jayant also worked at eBay, building out a new shopping platform, K2, using Nutch and Hadoop among others, as well as KLA-Tencor, building software for reticle inspection stations and defect analysis systems. Jayant holds a bachelor’s degree in computer science from IIT Kharagpur and a master’s degree in computer engineering from San Jose State University.

Photo of Vartika Singh

Vartika Singh

Cloudera

Vartika Singh is a solutions architect at Cloudera with over 10 years of experience applying machine learning techniques to big data problems.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Vartika Singh
Vartika Singh | SOLUTIONS ARCHITECT
25/04/2017 19:02 BST

Hello Claudia,

We will use primarily Scala as our hands-on language. Given the code we are using, it should not be hard for you translate the knowledge to Python.

Claudia Burgard | DATA SCIENTIST
25/04/2017 17:30 BST

Hello,
as I read under “prerequisites”, “Experience with machine learning, Scala, Java, and Python” will be expected – what language will be the focus in this course? As I do not know any Java or Scala but only Python..