Brought to you by NumFOCUS Foundation and O’Reilly Media
The official Jupyter Conference
Aug 21-22, 2018: Training
Aug 22-24, 2018: Tutorials & Conference
New York, NY

What things are correlated with gender diversity: A dig through the ASF and Jupyter projects

Holden Karau (Independent), matthew hunt (Bloomberg)
1:50pm–2:30pm Friday, August 24, 2018
Community, Usage and application
Location: Sutton Center/Sutton South Level: Beginner

Who is this presentation for?

  • Anyone who cares about improving the health of open source communities

Prerequisite knowledge

  • A basic understanding of OSS and Python
  • Familiarity with PySpark (useful but not required)

What you'll learn

  • Learn how to improve the diversity of your open source projects


Many of us believe that gender diversity in open source projects is important (for example, O’Reilly, Google, and the Python Software Foundation). (If you don’t, this isn’t going to convince you.) But what things are correlated with improved gender diversity, and what can we learn from similar historic industries?

Holden Karau and Matt Hunt explore the diversity of different projects, examine historic EEOC complaints, and detail parallels and historic solutions. To keep things interesting, Holden and Matt conclude with a comparative analysis of the state of OSS and various complaints handled by the EEOC in the ’60s, along with the solutions, suggestions, and binding settlements that were reached for similar diversity problems in other industries. This comparison is not legal advice but rather examples of what we can learn from early equal opportunity commission decisions.

Topics include:

  • Diversity of gender among the different levels of a given project’s leadership (committers, PMC, etc.)
  • The existence of codes of conduct
  • Language used in comments, code, and mailing lists
  • The rate of promotions for project participants
Photo of Holden Karau

Holden Karau


Holden Karau is a transgender Canadian software engineer working in the bay area. Previously, she worked at IBM, Alpine, Databricks, Google (twice), Foursquare, and Amazon. Holden is the coauthor of Learning Spark, High Performance Spark, and another Spark book that’s a bit more out of date. She’s a committer on the Apache Spark, SystemML, and Mahout projects. When not in San Francisco, Holden speaks internationally about different big data technologies (mostly Spark). She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal. Outside of work, she enjoys playing with fire, riding scooters, and dancing.

Photo of matthew hunt

matthew hunt


Matthew Hunt started playing with computers when he was 8, sold his first program at 13, and retains an unhealthy degree of curiosity. He lives in New York, where he can be found tinkering with 3D printers, dabbling in the future of flight, playing with VR headsets, and even doing work sometimes. He still believes that where you find people having the most fun, there will you find the future being created. Matthew runs the NYC Spark user group.

Comments on this page are now closed.


Picture of Douglas Blank
Douglas Blank | HEAD OF RESEARCH
08/16/2018 3:23am EDT

Looks to be like a great presentation! Unfortunately, I’ll be down the hall presenting myself. But not on a completely unrelated topic… I’ll be talking about the use of Jupyter over the last four years at an all-women’s college. I think many of the items that you have listed could apply to the classroom, too.