Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Joseph Kambourakis (databricks)
9:00am5:00pm Tuesday, March 6, 2018

Prerequisite knowledge

What you'll learn

  • Explore Apache Spark 2.0 core concepts with a focus on Spark's machine learning library

Description

Joseph Kambourakis introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case. Join Joseph to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training and fitting, and validation and inspection, as well as parameter tuning with grid search parameter selection.

The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.

Photo of Joseph Kambourakis

Joseph Kambourakis

databricks

Joseph Kambourakis is a data science instructor at Databricks. Joseph has more than 10 years of experience teaching, over five of them with data science and analytics. Previously, Joseph was an instructor at Cloudera and a technical sales engineer at IBM. He has taught in over a dozen countries around the world and been featured on Japanese television and in Saudi newspapers. He is a rabid Arsenal FC supporter and competitive Magic: The Gathering player. Joseph holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He lives with his wife and daughter in Needham, MA.

Comments on this page are now closed.

Comments

Picture of Joseph Kambourakis
Joseph Kambourakis | DATABRICKS
03/10/2018 3:07am PST

The slides can be found here: https://brookewenig.github.io/StrataSJC2018_Joe.html#/

Saran Thangavelu | LEAD & ARCHITECT
03/09/2018 10:01pm PST

How can I get the presentation slides?

Jane Chen | COMPUTER SCIENTIST
02/27/2018 9:46pm PST

The registration page is blank. Could you please check?

|
02/27/2018 6:54am PST

Hi Diana, There is no hardware of software installation. We will use Databricks Community Edition: https://accounts.cloud.databricks.com/registration.html#signup/community
Some familiarity with Apache Spark will be helpful, but there will be a review of core ideas.

Picture of Diana Maltsman
Diana Maltsman | ARCHITECT ADVISOR
02/21/2018 6:01am PST

what are the pre-requisites and laptop hardware/software requirements?

Shrinivas Deshpande | MANAGER
02/06/2018 9:35am PST

Will this course start with fundamentals or expect attendees to know basic concepts beforehand? How does one get most benefit if not worked on Spark before, but interested?