New York • San Jose • Berlin

Engineering the Future of Software

Feb 3–4, 2019: Training
Feb 4–6, 2019: Tutorials & Conference

New York, NY

Please log in

Add to Your Schedule

Building a robust content recommendation platform for 60 million news readers

Matt Chapman (mPulse Mobile)

4:50pm–5:40pm Wednesday, February 6, 2019

Application architecture
Location: Grand Ballroom West

Secondary topics: Case Study

Average rating:

(3.75, 8 ratings)

Download slides (PDF)

Who is this presentation for?

Data engineers, data scientists, software engineers, and operations engineers

Level

Intermediate

Prerequisite knowledge

A basic understanding of databases, microservices, containers, and machine learning (useful but not required)

What you'll learn

Learn how to build a horizontally scalable system for delivering real-time machine learning services

Description

In 2016, Tribune Publishing began built an in-house data science team to better leverage its vast datasets with new machine learning and analytics technologies. One of the primary successes of this team was its content recommendation system (“RecSys”), developed entirely in house on top of existing open source systems and new open source libraries created and released by Tribune.

Requirements for the RecSys included the ability to perform A/B/n testing against legacy human-edited and algorithmic recommendations, support multiple publications with both shared and exclusive content, support “real-time” online machine learning at scale, scale without limit in the face of traffic spikes, and gracefully degrade when responses can’t be delivered within a given time limit.

Matt Chapman leads a walkthrough of the lifecycle of the request from the web browser of a news-reading end user to the backend algorithms that generates up-to-the moment, personalized recommendations for what the user might want to read next. Along the way, Matt reviews the challenges that the team faced, the open source solutions used at each step, and the new framework and libraries developed by the team to make development of algorithms and of the system itself fast, flexible, and scalable.

Topics include:

Docker container orchestration with DCOS: What’s wrong with Kubernetes? (Nothing.)
Why so much Python? Isn’t it too slow? (No. Well, yes, but it doesn’t matter.)
Cassandra, Scylla, Memcached, and Redis: Who won the data store shoot-out? (No one.)
How to pickle TensorFlow models (or anything at all)
Kafka and when to avoid using a message broker (usually)
Operations monitoring with Graphite: What metrics matter? (Not many.)
ZMQ and why microservice architectures aren’t too slow (You’re too slow.)

Matt Chapman

mPulse Mobile

Matt Chapman is manager of data engineering at mPulse Mobile. Previously, he was the lead data engineer for Tribune Interactive. A hands-on leader of software engineering, Matt has professional programming experience in at least eight languages, many databases, and many more frameworks. He’s formed and contributed to teams of technologists for companies in a broad spectrum of industries, including web publishing, real estate, event management, finance, media, and healthcare, most recently focusing on applications of big data, machine learning, and data science.

Website

Platinum Sponsor

Gold Sponsors

Silver Sponsors

Exhibitors

Innovators

Supporter

Diversity & Inclusion Scholarship Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email SAconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of O'Reilly Software Architecture contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com