Build Systems that Drive Business
Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference
New York, NY

Distributed Data sessions

How we leverage distributed data and state is today’s key competitive advantage. This track explores the technical challenges and lessons learned in managing distributed state in large-scale applications that reliably process millions of events per second. Learn proven strategies and gain new insights from leading practitioners into how to handle real-time data in streams and events.

Track host

Baron SchwartzBaron Schwartz (VividCortex) is the founder and CTO of VividCortex, the best way to see what your production database servers are doing. Baron has written a lot of open source software and several books, including _High Performance MySQL_. He’s focused his career on learning and teaching about performance and observability of systems generally, including the view that teams are systems and culture influences their performance, and databases specifically.

11:35am–12:15pm Tuesday, October 2, 2018
Location: Nassau Level: Intermediate
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Ameet Kotian (Slack)
Average rating: *****
(5.00, 1 rating)
Slack’s rapid growth over the last few years outpaced the original database’s scaling capacity, which negatively impacted the company's customers and engineers. Ameet Kotian explains how a small team of engineers embarked on a journey for the right database solution, which eventually led them to Vitess, an open source cluster database. Read more.
2:25pm–3:05pm Tuesday, October 2, 2018
Location: Nassau Level: Intermediate
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Kristina Bennett (Google)
Average rating: ***..
(3.50, 2 ratings)
Kristina Bennett shares best practices for practical data recoverability and shines a light onto some of the pitfalls awaiting the unwary, based on lessons learned from five years of data integrity tooling and consulting across Google. Read more.
3:50pm–4:30pm Tuesday, October 2, 2018
Location: Nassau Level: Beginner
Secondary topics:  Resilient, Performant & Secure Distributed Systems
Bart De Vylder (CoScale)
Average rating: ****.
(4.00, 2 ratings)
Bart De Vylder shares his experience migrating an existing codebase and production environment to Kafka Streams, a relatively new and promising streaming library. Join in to see what aspects worked remarkably well and the challenges he ran into along the way. Read more.
4:45pm–5:25pm Tuesday, October 2, 2018
Location: Nassau Level: Beginner
Secondary topics:  Resilient, Performant & Secure Distributed Systems
James Meickle (Quantopian)
Average rating: *****
(5.00, 2 ratings)
Quantopian integrates financial data from vendors around the globe. As the scope of its operations outgrew cron, the company turned to Apache Airflow, a distributed scheduler and task executor. James Meickle explains how in less than six months, Quantopian was able to rearchitect brittle crontabs into resilient, recoverable pipelines defined in code to which anyone could contribute. Read more.
1:30pm–2:10pm Wednesday, October 3, 2018
Location: Nassau Level: Intermediate
Secondary topics:  Systems Architecture & Infrastructure
Leemay Nassery (Comcast)
Average rating: **...
(2.00, 1 rating)
Leemay Nassery discusses the importance of data collection pipelines and explains how to efficiently store datasets with the intention of making them easily accessible by a downstream machine learning platform. Read more.
3:50pm–4:30pm Wednesday, October 3, 2018
Location: Nassau Level: Intermediate
Secondary topics:  Distributed State
Alexander Rasmussen (Freenome)
In the past five years, Alexander Rasmussen has spent a lot of time trying to get high-integrity data out of spreadsheets and into databases. Alexander explores common data integrity problems when dealing with spreadsheet data, investigates whether those integrity problems are inescapable, and shares ongoing work to mitigate them. Read more.