Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Joseph Kambourakis (databricks)

9:00am–5:00pm Tuesday, March 6, 2018

Data science and machine learning
Location: LL20 D

Prerequisite knowledge

If you are registered for this tutorial, please go to https://accounts.cloud.databricks.com/registration.html#signup/community and sign up before you arrive onsite.

What you'll learn

Explore Apache Spark 2.0 core concepts with a focus on Spark's machine learning library

Description

Joseph Kambourakis introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case. Join Joseph to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training and fitting, and validation and inspection, as well as parameter tuning with grid search parameter selection.

The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.

Joseph Kambourakis

databricks

Joseph Kambourakis is a data science instructor at Databricks. Joseph has more than 10 years of experience teaching, over five of them with data science and analytics. Previously, Joseph was an instructor at Cloudera and a technical sales engineer at IBM. He has taught in over a dozen countries around the world and been featured on Japanese television and in Saudi newspapers. He is a rabid Arsenal FC supporter and competitive Magic: The Gathering player. Joseph holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He lives with his wife and daughter in Needham, MA.

Comments on this page are now closed.

Comments

Joseph Kambourakis | DATABRICKS

03/10/2018 3:07am PST

The slides can be found here: https://brookewenig.github.io/StrataSJC2018_Joe.html#/

Saran Thangavelu | LEAD & ARCHITECT

03/09/2018 10:01pm PST

How can I get the presentation slides?

Jane Chen | COMPUTER SCIENTIST

02/27/2018 9:46pm PST

The registration page is blank. Could you please check?

02/27/2018 6:54am PST

Hi Diana, There is no hardware of software installation. We will use Databricks Community Edition: https://accounts.cloud.databricks.com/registration.html#signup/community
Some familiarity with Apache Spark will be helpful, but there will be a review of core ideas.

Diana Maltsman | ARCHITECT ADVISOR

02/21/2018 6:01am PST

what are the pre-requisites and laptop hardware/software requirements?

Shrinivas Deshpande | MANAGER

02/06/2018 9:35am PST

Will this course start with fundamentals or expect attendees to know basic concepts beforehand? How does one get most benefit if not worked on Spark before, but interested?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com