Brooke Wenig walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Spark’s streaming capabilities and machine learning APIs. Join in to learn how to perform machine learning on Spark and explore the algorithms supported by the Spark MLlib APIs.
Each topic includes lecture content along with hands-on use of Spark through an elegant web-based notebook environment. Notebooks allow attendees to code jobs, data analysis queries, and visualizations using their own Spark cluster, accessed through a web browser. You can keep the notebooks and continue to use them with the free Databricks Community Edition offering. Alternatively, each notebook can be exported as source code and run within any Spark environment.
Spark overview
Spark internals
Graph processing with GraphFrames
Spark ML’s Pipeline API for machine learning
Spark Structured Streaming
Brooke Wenig is an instructor and data science consultant for Databricks. Previously, she was a teaching associate at UCLA, where she taught graduate machine learning, senior software engineering, and introductory programming courses. Brooke also worked at Splunk and Under Armour as a KPCB fellow. She holds an MS in computer science with highest honors from UCLA with a focus on distributed machine learning. Brooke speaks Mandarin Chinese fluently and enjoys cycling.
Get the Platinum pass or the Training pass to add this course to your package.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
It will be held in Python.
Hi Brooke,
Will this session be held in Python or Java?