Building and maintaining complex distributed systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

Scaling a user delivery network for real-time audience targeting

Adam Shepard (AudienceScience)
1:15pm–1:55pm Wednesday, June 21, 2017
Level: Advanced

Who is this presentation for?

  • Backend engineers, distributed systems engineers, and architects interested in high-volume and high-velocity distributed data processing and storage

Prerequisite knowledge

  • Basic knowledge of programming and distributed systems (e.g., the "Vs of big data" and CAP principles)

What you'll learn

  • Understand the unique challenges of scaling distributed systems, high-velocity data processing, and architectural approaches for achieving scale on highly variable data

Description

We’ve all seen those online ads that seem to follow you around the web as soon as you visit one site or check out one product, and we know that a combination of tracking technologies accumulate that data to auction your eyeballs off to the highest bidder. There are dozens of tools and technologies out there with blazing-fast performance that can serve data to processing systems in milliseconds or less, and plenty of blog posts and marketing material to support those claims. But that’s really just the tip of iceberg. How is that data generated? How is it managed, updated, and synchronized to provide that pinpoint targeting at internet scale?

Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Keeping large amounts of user data in sync across multiple data centers, pulling needles of user behavior out of a haystack, and turning it all into actionable or buyable behaviors in real time presents a massive infrastructure and software challenge. Adam explores AudienceScience’s journey and evolution through several iterations of processing that data at scale, including its most recent architecture, and shares lessons learned along the way.

Adam surveys some of the original infrastructures and technologies that powered AudienceScience’s user delivery network, diving into scaling and managing MySQL, Voldemort, and Cassandra and discussing the performance characteristics of those different technologies and their trade-offs, as he relates some difficult lessons on procuring, provisioning, and managing hybrid infrastructures. Adam concludes by offering an overview of the current architecture of AudienceScience’s user delivery network—backed by a large-scale stream processing infrastructure with Storm, Kafka, and Spark, with data served by a purpose-built high-speed, asynchronous read-behind cache—as well as the enhancements made to the latest architecture as it’s been deployed and battle-tested at scale.

Topics include:

  • Hybrid and distributed data store operation and management
  • Hardware provisioning and software tuning
  • Eventual consistency and caching patterns
  • Low-latency Java techniques
  • Network tuning for long distances
Photo of Adam Shepard

Adam Shepard

AudienceScience

Adam Shepard is a senior software architect at AudienceScience.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)