Stephane Rion introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine-learning library, using text mining on real-world data as the primary end-to-end use case.
Join Stephane to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection.
The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.
Stephane Rion is a senior data scientist at Big Data Partnership, where he helps clients get insight into their data by developing scalable analytical solutions in industries such as finance, gaming, and social services. Stephane has a strong background in machine learning and statistics with over 6 years’ experience in data science and 10 years’ experience in mathematical modeling. He has solid hands-on skills in machine learning at scale with distributed systems like Apache Spark, which he has used to develop production rate applications. In addition to Scala with Spark, Stephane is fluent in R and Python, which he uses daily to explore data, run statistical analysis, and build statistical models. He was the first Databricks-certified Spark instructor in EMEA. Stephane enjoys splitting his time between working on data science projects and teaching Spark classes, which he feels is the best way to remain at the forefront of the technology and capture how people are attempting to use Spark within their businesses.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com