Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Holden Karau

Holden Karau
Software Engineer, Independent


Holden Karau is a transgender Canadian software engineer working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.


4:15pm4:55pm Wednesday, December 6, 2017
Holden Karau (Independent)
Average rating: ****.
(4.50, 6 ratings)
Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. Holden Karau introduces Spark’s ML pipelines and explains how to extend them with your own custom algorithms, allowing you to take advantage of Spark's meta-algorithms and existing ML tools. Read more.
11:15am11:55am Thursday, December 7, 2017
Holden Karau (Independent), Joey Echeverria (Rocana)
Average rating: ****.
(4.20, 5 ratings)
Apache Spark offers greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. Holden Karau and Joey Echeverria explore how to debug Apache Spark applications, the different options for logging in Spark, and more. Read more.