Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Brooke Wenig (Databricks)
9:00am5:00pm Tuesday, September 26, 2017
Spark & beyond
Location: 1A 08/10
Secondary topics:  Text

What you'll learn

  • Explore Apache Spark 2.0 core concepts with a focus on Spark's machine learning library

Description

Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case.

Join in to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection.

The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.

Photo of Brooke Wenig

Brooke Wenig

Databricks

Brooke Wenig is an instructor and data science consultant for Databricks. Previously, she was a teaching associate at UCLA, where she taught graduate machine learning, senior software engineering, and introductory programming courses. Brooke also worked at Splunk and Under Armour as a KPCB fellow. She holds an MS in computer science with highest honors from UCLA with a focus on distributed machine learning. Brooke speaks Mandarin Chinese fluently and enjoys cycling.

Comments on this page are now closed.

Comments

Picture of Brooke Wenig
Brooke Wenig | INSTRUCTOR AND DATA SCIENCE CONSULTANT
10/02/2017 8:53am EDT

The slides are publicly available. However, I cannot share the course material with people who were not able to attend the session.

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
09/26/2017 6:44am EDT

@Mohammed – if the speaker decides they want the tutorial included in the video compilation and signs the presenter contract, then you’ll be able to access it.

Picture of Mohammed Ayub
Mohammed Ayub | DATA SCIENTIST
09/26/2017 6:14am EDT

Thanks, Sophia. Also, will you be posting the recording on the portal later?

Picture of Sophia DeMartini
Sophia DeMartini | SENIOR SPEAKER MANAGER
09/26/2017 5:51am EDT

Here’s the link to the documentation:

https://brookewenig.github.io/StrataNYC2017.html

Mukul Bharadwaj | JOHN WILEY AND SONS
09/26/2017 5:46am EDT

can you post the tutorial documentation url in this thread?

Shrinath Parikh | LEAD BIG DATA ANALYTICS ENGINEER
09/26/2017 5:29am EDT

Do you have a github where people who could not attend due to the conflict can download the tutorial or at least the dataset and instructions?

Picture of Mohammed Ayub
Mohammed Ayub | DATA SCIENTIST
09/25/2017 7:05pm EDT

Unfortunately, will miss this as it conflicts with another tutorial. Will this material be available for gold pass members?

Picture of Brooke Wenig
Brooke Wenig | INSTRUCTOR AND DATA SCIENCE CONSULTANT
09/05/2017 6:45pm EDT

Hi Thiago. All you need is a laptop. I will ask everyone to create a Databricks Community Edition account at the beginning of class if they have not already.

Thiago Martins dos Reis | PRODUCT DEVELOPMENT ENGINEER
09/05/2017 4:53am EDT

What do we need to bring to this sesssion? Just a laptop and an account at databricks? Thanks.

Picture of Sarah Kim
Sarah Kim | PRODUCT MANAGER, WORLDWIDE EVENTS
08/07/2017 8:25am EDT

Hi Reema, you can change your tutorial selection in your Account. Just click Account at the top right in the navigation area.

e9060ec0 8038ff30 | DATA WAREHOUSE ARCHITECT
08/07/2017 6:41am EDT

I would like to drop this tutorial and change it to Architecting A Data Platform and Architecting a next-generation data platform