Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

How we amplify privilege with supervised machine learning

Mike Lee Williams (Cloudera Fast Forward Labs)
2:55pm–3:35pm Wednesday, 09/30/2015
Law, Ethics, & Open Data
Location: 3D 04/09 Level: Non-technical
Average rating: ****.
(4.85, 13 ratings)

This talk will use the example of sentiment analysis to show that supervised machine learning has the potential to amplify the voices of the most privileged people in society.

A sentiment analysis algorithm is considered ‘table stakes’ for any serious text analytics platform in social media, finance, or security. In order for the problem to be tractable and the results to be interpretable, these algorithms reduce the ‘sentiment’ of a text to a one-dimensional classification (very positive, fairly negative, etc.). As an example of supervised machine learning, I’ll review briefly how these algorithms are trained. I’ll explain this process qualitatively so you develop an intuition for what is going on, but I’ll also show Python code that will give you practical techniques you can apply to your own data.

This one-dimensional, supervised approach means that sentiment analysis algorithms fail to measure what they claim to measure, but they don’t measure nothing. Rather they learn to spot unsubtle expressions of extreme emotion. In fact, the words a simple algorithm learns that are the most predictive of sentiment tend to be used by a particularly privileged group of authors: men.

From this specific example, I will develop the ways in which a supervised machine-learning algorithm can embed biases that enhance privilege or are otherwise harmful: from training data, to figures of merit, to feature selection.

These issues are morally and legally important to everyone who is in the business of making inferences about people from data.

Photo of Mike Lee Williams

Mike Lee Williams

Cloudera Fast Forward Labs

Mike Lee Williams is a research engineer at Cloudera Fast Forward Labs, where he builds prototypes that bring the latest ideas in machine learning and AI to life and helps Cloudera’s customers understand how to make use of these new technologies. Mike holds a PhD in astrophysics from Oxford.