Joseph Kambourakis walks you through using Apache Spark to perform exploratory data analysis (EDA), developing machine learning pipelines, and using the APIs and algorithms available in the Spark MLlib DataFrames API. Joseph also covers parallelizing machine learning algorithms at a conceptual level.
Joseph takes a pragmatic approach, focusing on using Apache Spark for data analysis and building models using MLlib and limiting the time spent on machine learning theory and the internal workings of Spark. You’ll work through examples using public datasets to learn how to apply Apache Spark to help you iterate faster and develop models on massive datasets and how to use familiar Python libraries with Spark’s distributed and scalable engine. You’ll leave with the tools and knowledge you need to get started using Spark for practical data analysis tasks and machine learning problems, as well as a firm understanding of DataFrames, the DataFrames MLlib API, and related documentation.
Joseph Kambourakis is a data science instructor at Databricks. Joseph has more than 10 years of experience teaching, over five of them with data science and analytics. Previously, Joseph was an instructor at Cloudera and a technical sales engineer at IBM. He has taught in over a dozen countries around the world and been featured on Japanese television and in Saudi newspapers. He is a rabid Arsenal FC supporter and competitive Magic: The Gathering player. Joseph holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He lives with his wife and daughter in Needham, MA.
Get the Platinum pass or the Training pass to add this course to your package.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org