Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

AI-powered crime prediction

Or Herman-Saffar (Dell), Ran Taig (Dell EMC)
2:40pm3:20pm Wednesday, March 7, 2018
Law, ethics, and governance, Strata Business Summit
Location: 210 D/H Level: Beginner
Average rating: *....
(1.00, 2 ratings)

Who is this presentation for?

  • Data scientists

Prerequisite knowledge

  • Familiarity with basic machine learning models

What you'll learn

  • Learn how data scientists are using the Crimes in Chicago dataset to find interesting trends and make predictions for the future


What if we could predict when and where crimes will be committed? Crimes in Chicago, a publicly published dataset of reported incidents of crime that have occurred in Chicago since 2001, contains as many as 6.4 million rows, and each row includes crime type, geographical location, and date and time when the crime occurred. This extensive data source is very valuable and can form the basis for a machine learning model. One direct and immediate motivation for the dataset is making crime counts predictions for specific crimes, which would assist the police in deciding which areas and times to increase their resources, having a concrete impact on citizens’ safety. However, previous work done on this dataset has been mostly descriptive—explorations made at a high level of the current state and counts (i.e., how many crimes have been committed up to a specific point in time)—rather than focused on predictive models.

Or Herman-Saffar and Ran Taig offer an overview of Crimes in Chicago and explain how to use this data to explore committed crimes to find interesting trends and make predictions for the future. Or and Ran conclude by exploring the development of a machine learning model that predicts crime counts for specific crime type on a given day in a specific district within Chicago and cover lessons and insights learned.

Photo of Or Herman-Saffar

Or Herman-Saffar


Or Herman-Saffar is data scientist at Dell. She holds an MSc in biomedical engineering, where her research focused on breast cancer detection using breath signals and machine learning algorithms, and a BS in biomedical engineering specializing in signal processing from Ben-Gurion University, Israel.

Photo of Ran Taig

Ran Taig

Dell EMC

Ran Taig is a senior data scientist at Dell EMC, where he leads data science projects, especially in domain of hardware failure prediction, and plays a key roll in designing the team engagement models and work structure, serving as a consultant to EMC’s business data lake team. Ran is also responsible for the team’s academic relations and continues to teach theory courses for CS students. Previously, Ran taught the Design of Algorithms and other CS theory courses at Ben-Gurion University. He holds a PhD in computer science from Ben-Gurion University, Israel, where he specialized in artificial intelligence. His research mainly focused on automated planning.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)


Picture of Eugene Kirpichov
03/07/2018 2:25pm PST

(Reiterating my point asked in person at the talk for visitors of this page because I think it’s really important to keep in mind for people doing similar work)

Crime prediction has very serious real-world impact and it is imperative to keep in mind the real-world biases and consequences at play. One can not treat this as simply an exercise in training and evaluating an ML model, especially if the intention is to give this model to policymakers or law enforcement.

For example, white people use drugs at the same frequency as non-white people, but people prosecuted for drug crimes are overwhelmingly non-white, because police intentionally look for such crimes (e.g. via random traffic stops) in non-white neighborhoods, but will rarely do the same in a white neighborhood. The crime dataset would predict that police should hunt in non-white neighborhoods even more, and in white neighborhoods even less – thus further perpetuating the racial disparity in prosecution for drug crimes.

Addressing this is an active research area called “machine learning fairness”. in fact concerns exactly the topic of this talk. There’s a class about it and a series of conferences and so on.