Joseph Kambourakis introduces you to Apache Spark 2.0 core concepts with a focus on Spark’s machine learning library, using text mining on real-world data as the primary end-to-end use case. Join Joseph to explore and wrangle data using Spark’s DataSet and DataFrame abstractions. You’ll use the Spark ML API to build an ML pipeline to transform free text into useful features via Spark ML’s Transformer abstraction (e.g., one-hot encoding and term frequency counting) and learn about model selection, training and fitting, and validation and inspection, as well as parameter tuning with grid search parameter selection.
The class will consist of approximately 50% hands-on programming labs in Scala and 50% lecture and discussion.
Joseph Kambourakis is a data science instructor at Databricks. Joseph has more than 10 years of experience teaching, over five of them with data science and analytics. Previously, Joseph was an instructor at Cloudera and a technical sales engineer at IBM. He has taught in over a dozen countries around the world and been featured on Japanese television and in Saudi newspapers. He is a rabid Arsenal FC supporter and competitive Magic: The Gathering player. Joseph holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He lives with his wife and daughter in Needham, MA.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
The slides can be found here: https://brookewenig.github.io/StrataSJC2018_Joe.html#/
How can I get the presentation slides?
The registration page is blank. Could you please check?
Hi Diana, There is no hardware of software installation. We will use Databricks Community Edition: https://accounts.cloud.databricks.com/registration.html#signup/community
Some familiarity with Apache Spark will be helpful, but there will be a review of core ideas.
what are the pre-requisites and laptop hardware/software requirements?
Will this course start with fundamentals or expect attendees to know basic concepts beforehand? How does one get most benefit if not worked on Spark before, but interested?