Monkeys & Math: How MailChimp Catches Bad Guys

Average rating: ****.
(4.67, 9 ratings)

MailChimp is an email service provider (ESP) that sends over three billion newsletters a month for two million active users. Their business model is predicated on successful delivery of clients’ newsletters to the inboxes of recipients. To maintain their reputation and delivery rate with major ISPs like Gmail, Yahoo, and Hotmail, MailChimp must prevent any spam or spam-like email from being sent over its system.

For small-scale ESPs, it is common practice to employ a team of compliance officers to make sure the content leaving the company is not spam, however now that MailChimp is one of the largest ESPs globally, this is no longer possible. No human could check three billion newsletters manually.

To that effect, MailChimp has created the Email Genome Project (EGP), a predictive analytics system capable of identifying the performance of email marketing campaigns before they leave the company’s system. EGP is built on a massive horizontally scalable data store populated with the billions of send and engagement records MailChimp has gathered over the past ten years. The system uses a NoSQL solution in RAM to augment its disk storage and provide lightening fast predictions to the application. Bad actors are identified in real-time and are purged from the system before they can do damage. As an added benefit, MailChimp uses the email graph data in EGP to provide good users with targeted interest data about their subscribers.

This talk will cover:

  • The business need for MailChimp to create the Email Genome Project
  • The storage technologies and analytic techniques used in the predictive modeling system
  • The revenue and scale benefits of transitioning to an automated, data-driven compliance process
  • How the email graph data in the system is used to provide real-time analytics to users concerning their subscribers’ engagement and interests
Photo of John Foreman

John Foreman


John Foreman is the Chief Data Scientist for where he leads MailChimp’s data product development effort called the Email Genome Project. He also runs the Data Science for Managers course at Analytics Made Skeezy.

John holds a graduate degree in Operations Research from MIT and has worked as an analytics consultant for the Department of Defense, Coca-Cola, Royal Caribbean International, and Intercontinental Hotels Group. His expertise is in optimization modeling, revenue management, and predictive modeling.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts