Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Spark camp: Apache Spark 2.0 for analytics and text mining with Spark ML

Stephane Rion (Big Data Partnership)
9:0017:00 Tuesday, 23 May 2017
Spark & beyond
Location: Capital Suite 11
Secondary topics:  Text Analysis and Mining
Average rating: ****.
(4.00, 2 ratings)

What you'll learn

  • Explore Apache Spark 2.0 core concepts with a focus on Spark's machine-learning library

Description

Stephane Rion introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine-learning library, using text mining on real-world data as the primary end-to-end use case.

Join Stephane to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection.

The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.

Photo of Stephane Rion

Stephane Rion

Big Data Partnership

Stephane Rion is a senior data scientist at Big Data Partnership, where he helps clients get insight into their data by developing scalable analytical solutions in industries such as finance, gaming, and social services. Stephane has a strong background in machine learning and statistics with over 6 years’ experience in data science and 10 years’ experience in mathematical modeling. He has solid hands-on skills in machine learning at scale with distributed systems like Apache Spark, which he has used to develop production rate applications. In addition to Scala with Spark, Stephane is fluent in R and Python, which he uses daily to explore data, run statistical analysis, and build statistical models. He was the first Databricks-certified Spark instructor in EMEA. Stephane enjoys splitting his time between working on data science projects and teaching Spark classes, which he feels is the best way to remain at the forefront of the technology and capture how people are attempting to use Spark within their businesses.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)