Sep 23–26, 2019

A practical guide to algorithmic bias and explainability in machine learning

Alejandro Saucedo (The Institute for Ethical AI & Machine Learning)
11:20am12:00pm Thursday, September 26, 2019
Location: 1A 12
Secondary topics:  Ethics

Who is this presentation for?

Data Scientist, Software Engineer, Data Engineer, Product Manager, Technical Lead, Engineering Manager,




Undesired bias in machine learning has become a worrying topic due to the numerous high profile incidents that have been covered by the media. It is certainly a challenging topic, as it could even be said that the concept of societal bias is inherently biased in itself depending on an individual’s (or group’s) perspective. In this talk we avoid re-inventing the wheel, instead we use traditional methods to simplify this issue so it can be tackled from a practical perspective.

In this talk we will cover the high level definitions of bias in machine learning to remove ambiguity, and we will demistify it through a hands on example. Our objective will be to automate the loan approval process for a company using machine learning. This will allow us to go through this challenge step by step, using key tools and techniques from latest research that will allow us to assess and mitigate undesired bias in our machine learning models.

We will begin by providing a high level definition of undesired bias as two constituent parts: “a-priori societal bias” and “a-posteriori statistical bias”. We will provide tangible examples of how undesired bias is introduced in each step. This initial section will introduce very interesting research findings in this topic. Spolier alert: We will take a pragmatic approach, showing how any non-trivial system will always have an inherent bias, so the objective is not to remove bias, but to make sure 1) you can get as close as possible to your objectives, and 2) you can make sure your objectives are as close as possible to the “ideal solution”.

In this talk we introduce a pragmatic process to assess bias in machine learning models through three key steps: 1) Data analysis, 2) Inference result analysis, and 3) Production metrics analysis. For each of these three steps we will walk through a real life example. We will be tasked with the automation of a loan approval process. We will show how some bias may affect our results in a negative way, as well as how we can use various techniques to ensure we perform a reasonable analysis. Our objective is not to show how to completely remove bias from a machine learning model, but instead what are the tools and techniques available, as well as the key touch-points & metrics to ensure the right domain experts are involved.

We will cover fundamental topics in data science such as feature importance analysis, class imbalance assessment, model evaluation metrics, partial dependence, feature correlation, etc. More importantly, we will cover how these fundamentals can interact at different touch-points with the right domain experts to ensure undesired bias is identified and documented. All will be covered with a hands on example through a practical jupyter notebook experience.

Prerequisite knowledge

The audience should consist of members in the development team of a machine learning system, including software engineers and data scientists. Ideally the audience will have experience with a machine learning project (prototype, or production).

What you'll learn

The audience will get a high level philosophical overview around the concept of bias in machine learning which will remove ambiguity and will help simplify the challenge when faced on a practical situation. We will equip the audience with key tools and techniques to be able to assess, identify and mitigate risks that arise from the unavoidable bias present.
Photo of Alejandro Saucedo

Alejandro Saucedo

The Institute for Ethical AI & Machine Learning

Alejandro is the Chief Scientist at the Institute for Ethical AI & Machine Learning, where he leads highly technical research on machine learning explainability, bias evaluation, reproducibility and responsible design. With over 10 years of software development experience, Alejandro has held technical leadership positions across hyper-growth scale-ups and tech giants including Eigen Tchnologies, Bloomberg LP and Hack Partners. He has a strong track record building departments of machine learning engineers from scratch, and leading the delivery of large-scale machine learning system across the financial, insurance, legal, transport, manufacturing and construction sectors (in Europe, US and Latin America).

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts