17–19 October 2016: Conference & Tutorials
19–20 October 2016: Training
London, UK

Getting started contributing to Apache Spark

Holden Karau (Independent)
13:35–14:15 Tuesday, 18/10/2016
Location: Windsor Suite Level: Intermediate
Average rating: *****
(5.00, 1 rating)

Prerequisite knowledge

  • Basic experience with Scala, Python, or Java
  • A general understanding of Spark

What you'll learn

  • Better understand Apache Spark's current process as well as how to contribute to Apache Spark if interested


Apache Spark is one of the most popular tools for big data and with 400+ open pull requests as of this writing, very active in terms of development as well. With such a large volume of contributions, it can be hard to know how to begin contributing yourself. Holden Karau offers a developer-focused head start, walking you through how to find good issues, formatting code, finding reviewers, and what to expect in the code review process. Holden also explores alternatives to contributing to Apache Spark directly (such as creating packages).

Photo of Holden Karau

Holden Karau


Holden Karau is a transgender Canadian software working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.