Building and maintaining complex distributed systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

Sessions

Learn new skills and best practices from expert speakers. Velocity sessions take place Wednesday, June 21 and Thursday June 22.

Wednesday, June 21

Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL20 A/B
Level: Intermediate
Karl Isenberg (Mesosphere)
Average rating: ***..
(3.71, 7 ratings)
The orchestration space is fast moving and full of competing products, platforms, and frameworks. How do you choose the right one for your requirements? Karl Isenberg explores the features of several container orchestrators, breaking down the feature sets and characteristics into categories and scoring multiple solutions against each other, and discusses what's new this year. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: 230 B
Level: Intermediate
Laine Campbell (OpsArtisan), Charity Majors (Honeycomb)
Average rating: **...
(2.43, 7 ratings)
SRE is becoming quite the ubiquitous term, but what about DBRE? Laine Campbell and Charity Majors dive into DBRE, exploring the paths to this craft and how to culturally evolve and support it. Laine and Charity focus on organizational scale, self-service, and force multipliers in recoverability, observability, availability, security, release management, and infrastructure. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL21 A/B
Level: Beginner
Average rating: ****.
(4.33, 3 ratings)
When embarking on a journey of transformation, you want to measure your current status and subsequent progress while keeping tabs on factors that drive improvement in technology performance. Nicole Forsgren explains the importance of knowing how (and what) to measure—ensuring you catch successes and failures when they first show up, not just when they’re epic. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL21 C/D
Level: Intermediate
Emil Stolarsky (Shopify), Justin Li (Shopify)
Average rating: ****.
(4.00, 2 ratings)
Once reserved for companies large enough to write a load balancer from scratch, load balancer middleware can be a powerful tool for scaling applications. Emil Stolarsky and Justin Li explain how Shopify uses scriptable load balancers to solve difficult infrastructure problems, such as sharding across data centers, handling flash sales, and responding quickly to DDoS attacks. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL21 E/F
Level: Non-technical
Lisa van Gelder (Bauer Xcel Media)
Average rating: ****.
(4.00, 1 rating)
Lisa van Gelder shares what she learned from an accidental A/B test. Last year, she interviewed for a new executive job at the same time as two (white, male) friends, and they compared notes. Lisa explains how "unqualified" is used to reject marginalized groups in tech and what we can do about it—both as individuals interviewing and as hiring managers looking to improve the interview process. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL20 C
Level: Intermediate
Stephen Feloney (CA Technologies)
Average rating: **...
(2.00, 5 ratings)
Delivering software continuously is a common ambition, but many face challenges pursuing this goal. Stephen Feloney shares new technologies, solutions, and best practices that make it easier for organizations to attain continuous delivery and leads a live demonstration showing end-to-end orchestration throughout the continuous delivery toolchain. Read more.
Add to your personal schedule
11:25am–12:05pm Wednesday, June 21, 2017
Location: LL20 D
Level: Intermediate
Martin Woodward (Microsoft)
Average rating: ***..
(3.83, 6 ratings)
Martin Woodward tells the full story of transforming Microsoft’s internal engineering systems from a collection of in-house tools built up over decades to One Engineering System with a globally distributed 24x7x365 service on the public cloud, utilizing modern techniques and industry-recognized open source technologies. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL20 A/B
Level: Intermediate
Dharmesh Kakadia (Microsoft)
Average rating: ***..
(3.00, 4 ratings)
Orchestration systems all have different design trade-offs. Despite best efforts, these choices are not always clear to developers using these systems. Dharmesh Kakadia describes the fundamentals of scheduling and explores the scheduling algorithms implemented by various orchestration systems, highlighting similarities, differences, and the consequences of the design choices for the users. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: 230 B
Level: Advanced
Adam Shepard (AudienceScience)
Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Join in to learn about eventually consistent data architectures, tiered and hybrid storage layers, and what it takes to manage that much data at scale. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL21 A/B
Level: Non-technical
Timothy Gross (Joyent)
Average rating: ****.
(4.60, 5 ratings)
Conway's law tells us that "organizations which design systems. . .are constrained to produce designs which are copies of the communication structures of these organizations." What if we turn Conway's law around? Timothy Gross explores how to make technology choices that improve the culture of your organization. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL21 C/D
Level: Intermediate
Matt Klein (Lyft)
Average rating: ****.
(4.33, 3 ratings)
Over the past several years, Lyft has migrated from a monolith to a sophisticated service mesh powered by Envoy. Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh SoA paradigm provides, and shares lessons learned along the way. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL21 E/F
Level: Non-technical
Julia Grace (Slack)
Average rating: ****.
(4.92, 13 ratings)
Julia Grace has built teams at IBM Research, startups, and Slack and has done due diligence for venture capitalists to determine how well a startup’s engineering team is working together. Drawing on this knowledge, Julia attempts to answer the question, Why do some teams ship features rapidly, support each other, and effectively communicate while others struggle? Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL20 D
Level: Intermediate
Michael Sage (Fugue)
Average rating: **...
(2.75, 4 ratings)
With the ready availability of cloud services, teams no longer need to invest in expensive testing environments, and no longer need to wait their turn to use them. Michael Sage demonstrates how to spin up and tear down exact clones of production environments using Jenkins 2 multibranch pipelines and Fugue. Read more.
Add to your personal schedule
1:15pm–1:55pm Wednesday, June 21, 2017
Location: LL20 C
Level: Intermediate
Karl Stewart (Akamai)
Average rating: ****.
(4.50, 2 ratings)
The next wave of testing is massive-scale microfocus testing, and it is uncovering millions of dollars of abandoned revenue. Karl Stewart explains how digital leaders are using massive-scale microfocus testing to guarantee their success. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL20 A/B
Level: Intermediate
Armon Dadgar (HashiCorp)
Average rating: ****.
(4.50, 4 ratings)
Armon Dadgar offers an overview of Nomad, an application scheduler designed for both long-running services and batch jobs. Along the way, Armon explores the benefits of using schedulers for empowering developers and increasing resource utilization and how schedulers enable new next-generation application architectures. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: 230 B
Level: Intermediate
Dan Jones (VictorOps)
Average rating: ****.
(4.00, 1 rating)
Dan Jones discusses VictorOps's transition to event sourcing and CQRS in distributed systems. Through the use of persistent actors, VictorOps was able to redesign, rebuild, and deploy the entire underlying infrastructure without any noticeable impact to end users. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL21 A/B
Level: Intermediate
Laura Frank (Codeship)
Average rating: ****.
(4.25, 4 ratings)
Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Laura Frank explores the algorithms that power all modern container orchestration platforms and shares actionable steps to keep your highly available services highly available. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL21 C/D
Level: Intermediate
Samir Jafferali (Linkedin)
Average rating: ****.
(4.00, 3 ratings)
With members in every corner of the world, LinkedIn has built services around six CDNs, numerous PoPs, and three DNS platforms. Samir Jafferali explains how LinkedIn uses big data to steer DNS intelligently, optimizes the CDNs for performance, mitigates DDoSes, and measures metrics using RUM and synthetic monitoring and shares best practices that edge teams of all sizes can benefit from. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL21 E/F
Level: Intermediate
Average rating: ***..
(3.75, 4 ratings)
Juan Pablo Buriticá explains how to use technical RFCs as a decision-making tool in your engineering organization to increase effectiveness. When implemented properly, technical RFCs can encourage trust and delegation, respectful discussions, knowledge sharing, and accountability and support good software design. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL20 C
Level: Beginner
Arijit Mukherji (SignalFx)
Average rating: ****.
(4.20, 5 ratings)
Modern infrastructure and DevOps practices are evolving rapidly. These trends pose a new set of monitoring challenges. Arijit Mukherji shares real-world examples demonstrating what these challenges are, some approaches that worked, and metrics system capabilities that helped SignalFx deal with the challenge. Read more.
Add to your personal schedule
2:10pm–2:50pm Wednesday, June 21, 2017
Location: LL20 D
Level: Advanced
Andy Smith (Wercker)
Average rating: ***..
(3.50, 2 ratings)
Micha Hernandez van Leuffen explains how current delivery systems are falling behind and why we need to change the mental model, create new best practices, and treat containers as first-class citizens. Along the way, Micha shares how Wercker implements continuous delivery in combination with Kubernetes. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL20 A/B
Level: Intermediate
Sebastien Goasguen (Bitnami)
Average rating: **...
(2.67, 3 ratings)
Kubernetes has emerged as one of the leading container orchestrators. Sebastien Goasguen explores its architecture and compares it with other orchestration/scheduling systems, outlining the similarities and explaining why Kubernetes API primitives make all the difference. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: 230 B
Level: Intermediate
Avantika Mathur (Electric Cloud)
Average rating: ***..
(3.00, 4 ratings)
Avan Mathur shares strategies for database deployments and rollbacks as well as some patterns and best practices for reliably deploying databases as part of your CD pipeline, safely rolling back database code, ensuring data integrity, and more. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL21 A/B
Level: Intermediate
Ben Sigelman (LightStep)
Average rating: *****
(5.00, 5 ratings)
Most sudden latency regressions in a distributed system are throughput or queueing problems. Now that some monitoring technologies can observe a system with full fidelity, we can connect the dots from a high-latency outlier request to the contended resource it’s waiting on. Ben Sigelman explains why this workflow could change the way we understand critical-path latency in distributed systems. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL21 C/D
Level: Intermediate
Lee Calcote (SolarWinds)
With application developers busily adopting container technologies, the time has come for network engineers to prepare for the unique challenges brought on by networking cloud-native applications. Lee Calcote walks you through available container connectivity options, explaining their function and when they should be used and comparing their performance characteristics. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL21 E/F
Level: Intermediate
Kathleen Vignos (Twitter)
Average rating: ***..
(3.50, 8 ratings)
Constant change—caused by high attrition, frequent reorganization, shifting priorities, and management turnover, among other reasons—is the new normal. It takes months to onboard a new team member and get them adding value. Kathleen Vignos offers tips, shortcuts, and stories for stabilizing a team and finding a path to productivity amid the chaos. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL20 C
Level: Intermediate
Vicky Giavelli (Micro Focus)
Average rating: **...
(2.50, 2 ratings)
Vicky Villalobos explores some of best practices and tooling used to load and monitor a system in order to find performance and behavior across any OS, deployment environment, or device and shares real-life success stories and best practices of teams who are navigating these challenges on a daily basis. Read more.
Add to your personal schedule
3:40pm–4:20pm Wednesday, June 21, 2017
Location: LL20 D
Level: Intermediate
Brad Stoner (AppDynamics)
As release velocity increases, teams are finding innovative ways to detect and resolve performance issues earlier in the development cycle. Brad Stoner explores how to implement an automated performance testing strategy and explains how leveraging APM (application performance management) tools can reduce time to market while increasing overall quality. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL20 A/B
Level: Intermediate
Brendan Burns (Microsoft Azure)
Average rating: *****
(5.00, 4 ratings)
Building reliable distributed systems is challenging and often bespoke, so it's hard for developers to share implementations and best practices. Brendan Burns explores common patterns for composing reliable distributed systems and shows how these patterns can be expressed via containers, so that they can be reused throughout many different applications. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL21 A/B
Level: Intermediate
Micheal Benedict (Pinterest)
Average rating: ****.
(4.00, 4 ratings)
Companies like Twitter, Pinterest, and Uber are powered by thousands of microservices. Managing the lifecycle of services (i.e., creating them, provisioning resources, deploying, metering, charging, and deprecating) at scale proves to be challenging. Micheal Benedict discusses the need for a lifecycle manager, how to implement governance, and the impact of such a system on developer productivity. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL21 C/D
Level: Intermediate
Devin Elliot (Unoceros)
Average rating: *....
(1.00, 1 rating)
It takes more than a one-tenth scale server-based test environment to seamlessly load balance and deliver content to millions of mobile users. Devin Elliot explains how UX for customers of major media and live streaming events was improved by leveraging idle distributed networks of smartphones and smart devices to repeatedly map, measure, and load test at scale. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL21 E/F
Level: Beginner
Roy Rapoport (Netflix)
Average rating: *****
(5.00, 3 ratings)
When you're a scrappy startup, being nimble, agile, and flexible comes with the territory. But how do you maintain agility when you're a much, much larger company? Hope is not lost. Roy Rapoport shares critical leadership practices—focusing on encouraging failure, growing heretics, and empowering dissent—that will help you maintain a technical and organizational edge. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL20 C
Level: Beginner
Patrick Hill (Atlassian)
Average rating: ****.
(4.25, 4 ratings)
Ever had an incident that didn't go as planned? Patrick Hill shares five values developed by Atlassian SREs to better handle incident management. Read more.
Add to your personal schedule
4:35pm–5:15pm Wednesday, June 21, 2017
Location: LL20 D
Eric Minick (IBM)
Average rating: ****.
(4.00, 1 rating)
An organization’s ability to adopt a DevOps approach for software delivery often hinges on a cultural transformation that may be more difficult than technology issues. Eric Minick explains how high-performing organizations have embraced culture change, as well as the impact on organizations that haven’t. If you're thinking about embarking on your own DevOps journey, remember—culture is key. Read more.

Thursday, June 22

Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Christine Yen (Honeycomb)
Average rating: ****.
(4.60, 5 ratings)
Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren't sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: 230 B
Level: Intermediate
Henry Robinson (Cloudera)
Average rating: *****
(5.00, 3 ratings)
It seems like everyone is building a distributed system. However, there's no common body of knowledge about how these systems should be built and scaled, beyond what is squirreled away in various academic papers. Henry Robinson shares lessons learned from over eight years spent building distributed systems and outlines a framework for thinking about distributed scaling challenges. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL21 A/B
Level: Beginner
Nora Jones (Netflix)
Average rating: ****.
(4.67, 6 ratings)
Chaos engineering isn't always the most popular practice among your developers. Nora Jones covers the specifics of designing a chaos engineering solution and explains how to increment your solution technically and culturally, the socialization and evangelism pieces that tend to get overlooked in the process, and how to get developers excited about purposefully injected failure. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL21 C/D
Level: Intermediate
Michael Kehoe (LinkedIn), Anil Mallapur (LinkedIn)
LinkedIn conducts regular traffic shifts during peak hours to ensure that it has sufficient capacity to handle extra load during disaster situations. Michael Kehoe and Anil Mallapur discuss how LinkedIn uses traffic shifts to mitigate user impact by migrating live traffic between its data centers and stress test site-wide services for improved capacity handling and member experience. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL21 E/F
Level: Intermediate
Allison Miller (Google)
Average rating: ****.
(4.50, 2 ratings)
Automation is critical for effective operations and security ops. In large-scale systems, manual intervention has to be the exception, not the expectation. But how can security be automated, given the complexity involved? Many platforms turn to ML or AI deployed in risk models. Allison Miller discusses data-driven decision tech and explains how ML and automation creates better defenses. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL20 C
Level: Intermediate
Refael Botbol (CA Technologies)
Average rating: ***..
(3.00, 2 ratings)
The goal of continuous testing is to find defects earlier and release software faster, which can be achieved by integrating a set of open source functional and performance testing tools in the early stages of the software delivery lifecycle. Refael Botbol explains how to integrate open source tools like Apache JMeter and Selenium with Taurus and Jenkins as part of a continuous testing effort. Read more.
Add to your personal schedule
11:25am–12:05pm Thursday, June 22, 2017
Location: LL20 D
Level: Intermediate
Ranjeeth Karthik Selvan Kathiresan (Salesforce), Gurpreet Multani (Salesforce.com)
Even though HBase is considered a highly scalable distributed solution, there are cases where the schema design of HBase tables or the way a client uses an HBase cluster may impact the scalability factor of HBase. Ranjeeth Karthik Selvan Kathiresan and Gurpreet Multani outline the most important things to consider when scaling your HBase cluster to accommodate high-volume and high-velocity data. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Suman Karumuri (Pinterest)
Average rating: *****
(5.00, 2 ratings)
Distributed tracing is an emerging field of monitoring distributed systems. Suman Karumuri shares the challenges of building and deploying distributed tracing at scale using PinTrace, one of the largest distributed tracing pipelines. Drawing on real-world examples, Suman explains how traces can be used to understand, debug, and optimize your production workflows. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: 230 B
Level: Intermediate
Sangeeta Narayanan (Netflix)
Average rating: ***..
(3.80, 5 ratings)
Netflix operates a customizable API that allows the creation of optimized experiences on 1,000+ devices by providing developers a serverless-like platform and experience. Sangeeta Narayanan shares lessons learned operating and scaling the platform over the years and Netflix's approaches to some of the challenges it faced. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL21 A/B
Level: Intermediate
Oliver Gould (Buoyant)
Average rating: ****.
(4.33, 3 ratings)
Modern application architecture is becoming cloud native: containerized, "microserviced," and orchestrated. But resilience is more than just Docker and Kubernetes. Oliver Gould explains why companies like PayPal, Ticketmaster, and Monzo are adopting the service mesh model, where internal, service-to-service traffic is managed and instrumented with a mesh of load-balancing proxies. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL21 C/D
Level: Intermediate
David Radcliffe (Shopify)
The flexibility and speed offered by cloud computing solutions have raised the bar for bare metal deployments. Automation is essential to speedy, reliable provisioning and capacity management. David Radcliffe explores the tools Shopify uses, such as Genesis, to automate its data center and empower developers to move quickly and keep up with the times. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL21 E/F
Level: Intermediate
James Wickett (Signal Sciences)
Average rating: ****.
(4.33, 3 ratings)
Serverless is the design pattern for writing applications at scale without the necessity of managing infrastructure. It adds simplicity and a new economic model to cloud computing, but it creates some unique security challenges. James Wickett explores practical security approaches for serverless in four key areas: the software supply chain, the delivery pipeline, data flow, and attack detection. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL20 D
Level: Intermediate
Peco Karayanev (Riverbed Technology)
Average rating: ****.
(4.50, 4 ratings)
If you truly care about end-user experience and need to build highly scalable applications, you must stop treating your users, code, servers, and networks as independent systems. Peco Karayanev discusses a modern integrated visibility approach, where all monitoring shares a common data model that reveals issues previously hidden or misdiagnosed. Read more.
Add to your personal schedule
1:15pm–1:55pm Thursday, June 22, 2017
Location: LL20 C
Level: Intermediate
Miles Ward (Google)
Average rating: *****
(5.00, 2 ratings)
Google Cloud Spanner, Google's public launch of the internal Spanner service, makes available a new basic primitive for application design: globally consistent transactions. Want to know how it all works? Join Miles Ward for a detailed, demo-filled, nuanced look at the useful applications of Spanner for your workload. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Sneha Inguva (DigitalOcean)
Average rating: *****
(5.00, 1 rating)
Over the past year, DigitalOcean's Delivery team has been building a runtime platform based on Kubernetes with the goal of making shipping code easier. A core component of this system is a monitoring and alerting system based on Prometheus and Alertmanager. Sneha Inguva offers an overview of the system and shares problems encountered, potential solutions, and key lessons learned in the process. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: 230 B
Level: Beginner
Jamie Winsor (Chef Software)
Average rating: ****.
(4.50, 4 ratings)
Understanding and building distributed systems can be a daunting task, but like most other software development patterns, distributed systems mimic concepts in the real world that you're already familiar with. Jamie Winsor walks you through building a mental model to help you understand the basics of building distributed systems based on concrete, real-world systems. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL21 A/B
Level: Beginner
Ann Kilzer (Indeed)
Average rating: ****.
(4.00, 3 ratings)
Remember the old practice of the canary in the coal mine, where miners used fragile feathered friends as a failure detector for toxic gasses? In software, a canary run is a trial executed on one machine before the rest of the cluster runs. Ann Kilzer explains how Indeed created a canary service leveraging Consul’s key value store to improve the resilience of data reloads in any infrastructure. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL21 C/D
Level: Intermediate
Daniel Spoonhower (LightStep)
Average rating: ***..
(3.67, 3 ratings)
As software grows more complex, doing chargebacks and capacity planning gets more challenging. Specifically, it becomes more difficult to attribute storage and other low-level requests to high-level services. Daniel Spoonhower shows how the distributed tracing concept of context propagation can be used to overcome this problem, without any maintenance costs. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL21 E/F
Level: Intermediate
Douglas Barth (Stripe), Evan Gilman (N/A)
Average rating: ***..
(3.67, 3 ratings)
Douglas Barth and Evan Gilman offer an overview of Zero Trust, a new security model that considers all parts of the network to be equally untrusted. Doug and Evan show how to leverage a network's strengths by combining traditional SRE security approaches with novel technological arrangements while using software and hardware to secure the systems operating in those networks. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL20 C
Level: Beginner
Phil Stanhope (Oracle + Dyn)
For more than 30 years, the DNS has been one of the fundamental protocols of the internet, yet, despite its accepted importance, it has never quite gotten the due it deserves. Andy Smith explains why it's time to rethink DNS and realize the role it can play in building and running high-performance, distributed web applications. Read more.
Add to your personal schedule
2:10pm–2:50pm Thursday, June 22, 2017
Location: LL20 D
Dominic Williams (DFINITY)
DFINITY, a new kind of open cloud computing resource, takes the form of a decentralized network that conjures a performant "blockchain computer" with unbounded capacity that will act much like a gigantic shared mainframe for the world. Dominic Williams introduces the project and explores the foundational decentralized computing techniques it makes use of. Read more.
Add to your personal schedule
3:40pm–4:20pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Brendan Gregg (Netflix)
Advanced performance observability and debugging has arrived in Linux 4.x, with enhanced BPF (eBPF). Brendan Gregg offers an overview of Linux's new dynamic and static tracing tools for the analysis of filesystems, storage, CPUs, TCP, and more. Join in to explore a new generation of tools and visualizations. Read more.
Add to your personal schedule
3:40pm–4:20pm Thursday, June 22, 2017
Location: 230 B
Level: Intermediate
Caitie McCaffrey (Twitter)
Average rating: ****.
(4.75, 4 ratings)
Testing and verifying distributed systems is critically important. Caitie McCaffrey shares strategies for proving a distributed system is correct, including both formal methods and more practical forms of testing, such as fault injection and property-based testing, ensuring you are confidant that your systems are doing the right thing. Read more.
Add to your personal schedule
3:40pm–4:20pm Thursday, June 22, 2017
Location: LL21 A/B
Level: Intermediate
Gwen Shapira (Confluent), Jeff Holoman (Cloudera)
Average rating: ****.
(4.75, 4 ratings)
Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? Gwen Shapira and Jeff Holoman walk you through everything that happens to a message, from producer to consumer, and pinpoint all the places where data can be lost if you're not careful. Read more.
Add to your personal schedule
3:40pm–4:20pm Thursday, June 22, 2017
Location: LL21 C/D
Level: Intermediate
Patrick Reynolds (GitHub)
Average rating: *****
(5.00, 1 rating)
GitHub uses Spokes, a custom application-level replication system, to provide redundancy and scalable capacity for the Git service. Originally, Spokes was limited to a single physical site. Patrick Reynolds offers an overview of Spokes and explains how GitHub extended it to span multiple sites, transparently providing read-anywhere, write-anywhere replication for all Git content. Read more.
Add to your personal schedule
3:40pm–4:20pm Thursday, June 22, 2017
Location: LL21 E/F
Level: Intermediate
Pete Cheslock (Threat Stack)
Pete Cheslock shares the operational and security practices that helped Threat Stack scale while staying stable and secure, covering technology and tools and the various scale points that forced hard decisions. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, June 22, 2017
Location: LL20 A/B
Level: Intermediate
Megan Anctil (Slack)
Average rating: ****.
(4.83, 6 ratings)
One size definitely doesn't fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil shares pain points and lessons learned at Slack wrangling known technologies such as Icinga, Graphite, Grafana, and the Elastic Stack to best fit the company's use cases. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, June 22, 2017
Location: 230 B
Level: Intermediate
Simon de Haan (Praekelt.org)
Average rating: ****.
(4.00, 1 rating)
Developing reliable healthcare systems requires careful integration of a country’s health, tech, and legal ecosystems. In Africa, locally built resilient distributed systems are needed to meet the demand of national-scale digital health services and data sovereignty laws. Simon de Haan explores the challenges and proven solutions building in these environments. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, June 22, 2017
Location: LL21 A/B
Level: Intermediate
Aaron Blohowiak (Netflix)
Average rating: ****.
(4.50, 4 ratings)
Chaos Monkey and Kong changed the culture around infrastructure failure, but the most common cause of downtime is service failure. Turning off an entire service in production is too risky. Aaron Blohowiak offers an overview of precision chaos techniques that verify service-level fault tolerance and reveal hidden resource constraints while minimizing potential fallout. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, June 22, 2017
Location: LL21 C/D
Level: Intermediate
Jack Chan (Shutterfly)
Average rating: *****
(5.00, 1 rating)
Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos metadata on AWS. Read more.
Add to your personal schedule
4:35pm–5:15pm Thursday, June 22, 2017
Location: LL21 E/F
Level: Intermediate
Average rating: ****.
(4.00, 1 rating)
Fastly operates the edge for many large web properties. To deal with emerging threats to its network, Fastly created a process that allows it to respond effectively to incidents: Incident Command, which rapidly coordinates teams during an incident. Maarten Van Horenbeeck and Lisa Phillips take you to the far side of the edge, demonstrating the protocols that work during an incident. Read more.