Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

How to be fair: A tutorial for beginners

Aileen Nielsen (Skillman Consulting)
1:30pm–5:00pm Tuesday, 09/11/2018
Data science and machine learning
Location: 1E 11 Level: Intermediate
Secondary topics:  Ethics and Privacy
Average rating: ****.
(4.00, 4 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • Basic knowledge of machine learning and either Python or R (The tutorial will be conducted in both languages, but it's structured so that knowing only one or the other will be sufficient to get most of the material worked through.)

Materials or downloads needed in advance

  • A laptop with Python and R installed
  • Please download relevant materials from the course GitHub repository (link TBD)

What you'll learn

  • Learn how to apply practical ethics lessons to your day-to-day workflows


There is mounting evidence that the widespread deployment of machine learning and artificial intelligence in business and government applications is reproducing or even amplifying existing prejudices and social inequalities. Even when an organization or an individual software engineer seeks to maintain fairness and accuracy, it’s easy to unintentionally create software that exhibits discriminatory or privacy-violating behavior.

Aileen Nielsen demonstrates how to identify and avoid bias and other unfairness in your analyses and apply best practices when developing new software and machine learning products.


Introduction and social relevance

  • Relevant news stories
  • A brief introduction to relevant legal concepts and their applicability to data analysis and model building

Data discovery

  • Examples of how “bad” or incomplete datasets can lead to discriminatory models
  • How to examine your input data and balance your input data before inputting into an analysis pipeline

Data processing

  • Examples of how data processing has resulted in discriminatory models
  • How to examine your preprocessing pipeline to prevent discriminatory inputs
  • Examples of how data processing has resulted in privacy-violating models
  • How to examine your process for privacy leaks


  • Examples of how choice of model can lead to discriminatory results
  • Examples of how models can be designed to be more or less vulnerable to discriminatory input data
  • How to test your model and examine final parameters and fits for discriminatory behavior for a variety of common model families

Auditing your model

  • Examples of how even models following processes above may still yield discriminatory behavior
  • Auditing your model as a black box with existing Python language solutions

Research frontiers

  • Updates on how computer scientists and sociologists are developing new methods to avoid discriminatory and privacy-violating models
  • A roundup of newly published papers that illustrate the breadth and current state of this active area of research
Photo of Aileen Nielsen

Aileen Nielsen

Skillman Consulting

Aileen Nielsen works at an early-stage NYC startup that has something to do with time series data and neural networks, and she is also the author of a Practical Time Series Analysis, published in 2019, and an upcoming book, Practical Fairness, to be published in summer 2020. Previously, Aileen worked at corporate law firms, physics research labs, a variety of NYC tech startups, and most recently, the mobile health platform One Drop as well as on Hillary Clinton’s presidential campaign. Aileen currently serves as the chair of the NYC Bar’s Science and Law Committee as well as a Fellow in Law and Tech at ETH Zurich. Aileen is a frequent speaker at machine learning conferences on both technical and legal subjects.

Comments on this page are now closed.


09/30/2018 7:28am EDT

can you please post the slides? Thank you

Picture of Aileen Nielsen
09/11/2018 7:35am EDT

Here is the git repo:

Alexander Pelivan | DATA ENGINEER
09/10/2018 7:06am EDT

Hi, can you please provide the link to the github repo?