Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

In-Person Training
Apache Spark programming

Kenneth Jones (Databricks, Inc.)
9:00am–5:00pm Tuesday, 09/11/2018
Location: 1A 17

To attend a training course, you must be registered for a Platinum or Training pass; does not include access to tutorials on Tuesday.

Ken Jones walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Spark’s streaming capabilities and machine learning APIs.

What you'll learn, and how you can apply it

  • Understand Spark’s fundamental mechanics and Spark internals
  • Learn how to use the core Spark APIs to operate on data, build data pipelines and query large datasets using Spark SQL and DataFrames, analyze Spark jobs using the administration UIs and logs inside Databricks, and create Structured Streaming and machine learning jobs
  • Be able to articulate and implement typical use cases for Spark

This training is for you because...

  • You're a software developer, data analyst, data engineer, or data scientist who wants to use Apache Spark for machine learning and data science.

Prerequisites:

  • Experience coding in Python or Scala and using Spark
  • A basic understanding of data science topics and terminology
  • Familiarity with DataFrames (useful but not required)

Hardware and/or installation requirements:

  • A laptop with an up-to-date version of Chrome or Firefox installed (Internet Explorer not supported)

Ken Jones walks you through the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, and Spark’s streaming capabilities and machine learning APIs. Join in to learn how to perform machine learning on Spark and explore the algorithms supported by the Spark MLlib APIs.

Each topic includes a lecture combined with hands-on exercises that use Spark through an elegant web-based notebook environment. Notebooks allow you to code jobs, data analysis queries, and visualizations using your own Spark cluster, accessed through a web browser. You can keep the notebooks and continue to use them with the free Databricks Community Edition offering. Alternatively, each notebook can be exported as source code and run within any Spark environment.

Outline

Spark overview

  • The DataFrames programming API
  • Spark SQL
  • The Catalyst query optimizer
  • The Tungsten in-memory data format
  • The Dataset API, encoders, and decoders
  • Use of the Spark UI to help understand DataFrame behavior and performance
  • Caching and storage levels

Spark internals

  • How Spark schedules and executes jobs and tasks
  • Shuffling, shuffle files, and performance
  • How various data sources are partitioned
  • How Spark handles data reads and writes

Graph processing with GraphFrames

Spark ML’s Pipeline API for machine learning

Spark Structured Streaming

About your instructor

Ken Jones is an Apache Spark instructor at Databricks. Ken has thousands of hours of in-class instruction experience presenting classes on Spark, Scala, and other open source technologies to Fortune 500 companies and individual developers worldwide. Previously, Ken was a senior instructor at Twitter, where in his role as coordinator for Twitter’s engineering onboarding program, he taught classes on Scala programming and backend service development in Scala. Ken also spent several years teaching Android application development and Android operating system internals, as well as several programming languages. He is the coauthor of Practical Programming in Tcl and Tk, 4th edition, and Tcl and the Tk Toolkit, 2nd edition. Ken lives in San Diego, CA, with his husband, Dean, and their cat, Jasper. He enjoys traveling extensively for work to accumulate airline miles and hotel points so that he can travel extensively for pleasure. When not in front of a class or wandering about strange cities, he likes to read and watch science fiction and fantasy, listen to jazz and ’80s alternative music, and mix (and drink) cocktails.

Conference registration

Get the Platinum pass or the Training pass to add this course to your package.

Comments on this page are now closed.

Comments

Picture of Joseph Kambourakis
Joseph Kambourakis | DATABRICKS
07/28/2018 8:30am EDT

If you have any questions, please email me directly at josephk@databricks.com.