Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Big Data vs Zombies: Using Algorithms, Big Data, and Large Scale Distributed Processing to Combat Identity Fraud

Jesse Shaw (LexisNexis)
11:50am–12:30pm Friday, 10/17/2014
Location: 1 E10/1 E11
Average rating: ****.
(4.40, 5 ratings)

This presentation will take a closer look at how linking algorithms have been reframed with the advent of high performance distributed processing systems to challenge traditional methods and approaches. No longer is the scale of data a hindrance for linking algorithms. The ability to process data in scale at high speed is now a key differentiator within smart linking algorithms because the more data they have, the more they learn about the data, the better the linking algorithms perform and the more precisely they understand what make you “you”, in data.

This session will cover the value linking algorithms bring to identity risk management, how LexisNexis has applied linking algorithms, data and super compute capability to the challenge of identity risk management and combatting identity fraud. We will explore why data point “specificity” is important, for instance, how specific your name is within the population contributes towards the linking algorithms ability to both uniquely disambiguate your identity but also more accurately tie you other data points more quickly. We will briefly cover why social graphs are especially powerful within linking algorithms, which is an extension of “Which John Smith? Oh Dave and Julie’s son”.

LexisNexis has over 10,000 sources of data that stream in daily and monthly, one of the key challenges faced was how to automate scalable linking on a high performance distributed supercomputer, while still maintaining accuracy and precision. Accuracy and precision in linking helps generate a more complete picture of your identity in data, which leads us to leveraging this ability to combat various types of identity fraud schemes and identify compromised identities. This presentation will specifically cover interesting examples and case study results on the following topics and phenomenon and what we learn from them.

 Zombie Identities.
o Identities that continue to live in data, long after they have shuffled off their mortal coil.
o Top 3 Contributors to the Zombie identity phenomenon.
o The fine line between benign Zombie identities and outright fraud.
o The link between Zombie identity fraud and familial identity fraud on minors.

 Identity Crowding and the invasion of the Data Snatchers.
o What are Data Snatchers and what is the risk to you the consumer or a business?
o Common points of compromise and high risk businesses.
o Combatting Identity Data Snatchers and reverse engineering lists of compromised identities.

 Invisible Identities and aliens.
o Non-US Identities that appear as legitimate US identities.
o Foreign Students
o Aliens who lack a public records footprint.

Lastly we will briefly talk about the future. How applying machine learning in scale to Identity classification, clustering and more will help better understand and predict the data points we expect to see within your footprint in the future and which deviations are associated with identity fraud.

Jesse Shaw


Mr. Shaw is a consulting software engineer at LexisNexis Risk Solutions. He has responsibilities to leverage the four petabyte core of LexisNexis data assets as well as spearheads big data R&D using the LexisNexis Public Data Graph, for various industries to help customers target fraud, collusion and other red flag social indicates. Prior to his R&D position at LexisNexis, Mr. Shaw worked in customer integration, product development, data security, and privacy compliance.