Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case.
Join in to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training/fitting, and validation/inspection, as well as parameter tuning with grid search parameter selection.
The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.
Brooke Wenig is an instructor and data science consultant for Databricks. Previously, she was a teaching associate at UCLA, where she taught graduate machine learning, senior software engineering, and introductory programming courses. Brooke also worked at Splunk and Under Armour as a KPCB fellow. She holds an MS in computer science with highest honors from UCLA with a focus on distributed machine learning. Brooke speaks Mandarin Chinese fluently and enjoys cycling.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org