Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

Distributed Data sessions

Data can be a competitive advantage if used correctly. This track explores the technical challenges and lessons learned in managing distributed state in large-scale applications that reliably process millions of events per second. Learn proven strategies and gain new insights from leading practitioners into how to handle real-time data in streams and events.

Track host

Nathan TaylorNathan Taylor is an Oakland-based software developer. He has hacked on low-level systems software such as the Twitter Java runtime and the Xen virtual machine monitor. Originally a Trombone major, he holds an M.Sc. from the University of British Columbia, where he researched full-system binary rewriting and dynamic analysis systems. When not in front of a computer, you're likely to find him either baking bread or suffering cycling up a steep hill.

11:25am–12:05pm Wednesday, June 13, 2018
Location: 230 B Level: Beginner
Secondary topics: Distributed State
John Mumm (Wallaroo Labs)
Average rating: *****
(5.00, 1 rating)
Coordination is a common source of performance problems when dealing with distributed state. John Mumm shares strategies for avoiding coordination and relying on local knowledge wherever possible along with pros and cons and tips for using in-memory state instead of the typical approach of using external data stores. Read more.
1:15pm–1:55pm Wednesday, June 13, 2018
Location: 230 B Level: Intermediate
Secondary topics: Systems Architecture & Infrastructure
Lena Hall (Microsoft)
Average rating: ****.
(4.12, 8 ratings)
Data is generated at an ever-increasing rate, so your architecture for ingesting these incoming influxes of data needs to be flexible, scalable, fast, and resilient. Alena Hall walks you through using distributed systems like Apache Kafka and Spark Streaming to process data coming from multiple sources in real time, do processing, and perform machine learning tasks. Read more.
2:10pm–2:50pm Wednesday, June 13, 2018
Location: 230 B Level: Beginner
Secondary topics: Systems Architecture & Infrastructure
Miro Cupak (DNAstack)
The Beacon Network is the largest search and discovery engine of human genomic data in the world. Miro Cupak details the architecture and technologies behind the system with focus on the technical decisions that allow it to scale and disrupt the perception of genetic data. Read more.
3:40pm–4:20pm Wednesday, June 13, 2018
Location: 230 B Level: Beginner
Secondary topics: Systems Monitoring & Orchestration
Victoria Nguyen (Fastly)
Average rating: *****
(5.00, 3 ratings)
Victoria Nguyen explains how Fastly overhauled the monitoring and data collection of its globally distributed network without its caches noticing. Read more.
4:35pm–5:15pm Wednesday, June 13, 2018
Location: 230 B Level: Advanced
Secondary topics: Distributed State
Jon Tirsen (Square)
Average rating: *****
(5.00, 3 ratings)
Jon Tirsen explains how Square scaled out the backend for its Cash app using Vitess, a database middleware for MySQL built at YouTube. Read more.