Data science and machine learning with Apache Spark (SOLD OUT)
Monday, 21 May & Tuesday, 22 May, 9:00 - 17:00
Location: Capital Suite 1
Behzad Bordbar demonstrates how to implement typical data science workflows using Apache Spark. You'll learn how to wrangle and explore data using Spark SQL DataFrames and how to build, evaluate, and tune machine learning models using Spark MLlib.
What you'll learn, and how you can apply it
- Learn how to use Spark SQL DataFrames to load, explore, transform, join, and analyze data and Spark MLlib to build, evaluate, and tune machine learning models
This training is for you because...
- You're a data scientist who wants to learn how to use Spark to scale your process up to large, distributed datasets.
- You're a data engineer, data analyst, or developer who wants to learn how to implement typical data science and machine learning workflows in Spark.
- A working knowledge of Python
- A basic understanding of data analysis, statistical modeling, and machine learning
Hardware and/or installation requirements:
- A laptop with a modern version of Chrome or Firefox installed
Behzad Bordbar demonstrates how to implement typical data science workflows using Apache Spark. You’ll learn how to wrangle and explore data using Spark SQL DataFrames and how to build, evaluate, and tune machine learning models using Spark MLlib. Demonstrations and exercises will be conducted in Python using Cloudera Data Science Workbench.
- Introduction to Spark SQL DataFrames
- Reading and writing DataFrames
- Transforming and joining DataFrames
- Grouping and exploring DataFrames
- Introduction to Spark MLlib
- Extracting and transforming features
- Building and evaluating regression, classification, and clustering models
- Tuning hyperparameters and validating models
- Working with machine learning pipelines
About your instructor
Behzad Bordbar is a mathematician, software engineer, and big data technical instructor at Cloudera, where he teaches courses on Hadoop, Hive, Impala, and Spark. Behzad has worked in academia for over 12 years and has been a visiting scientist at HP, BT, and IBM.
Get the Platinum pass or the Training pass to add this course to your package.