Build Resilient Distributed Systems
June 19–20, 2017: Training
June 20–22, 2017: Tutorials & Conference
San Jose, CA

Speakers

New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Peter Alvaro is an Assistant Professor of Computer Science at the University of California Santa Cruz, where he leads the Disorderly Labs research group. His research focuses on using data-centric languages and analysis techniques to build and reason about data-intensive distributed systems, in order to make them scalable, predictable and robust to the failures and nondeterminism endemic to large-scale distribution. Peter earned his PhD at UC Berkeley, where he studied with Joseph M. Hellerstein. He is a recipient of the NSF CAREER award.

Presentations

Orchestrating chaos: Applying database research in the wild Keynote

In this keynote, Peter will describe LDFI’s theoretical roots in the database research notion of provenance, present early results from the field, and present opportunities for near- and long-term future research.

Megan is a senior engineer on the Technical Operations team at Slack. She enjoys deep dives in debugging and long walks on the beach with her #MonitoringLove(s).

Presentations

Our many monitoring monsters Session

One size definitely doesn't fit all when it comes to open source monitoring solutions, and executing generally understood best practices in the context of unique distributed systems presents all sorts of problems. Megan Anctil shares pain points and lessons learned at Slack wrangling known technologies such as Icinga, Graphite, Grafana, and Elastic Stack to best fit the company's use cases.

Doug Barth is a site reliability engineer at Stripe. Doug has a deep interest in software, hardware, and production systems and has spent his career using computers to solve hard problems. He helped deploy PagerDuty’s IPsec mesh network and is now writing Zero Trust Networks.

Presentations

Zero Trust networks: Building systems in untrusted networks Session

Douglas Barth and Evan Gilman offer an overview of Zero Trust, a new security model that considers all parts of the network to be equally untrusted. Doug and Evan show how to leverage a network's strengths by combining traditional SRE security approaches with novel technological arrangements while using software and hardware to secure the systems operating in those networks.

Micheal Benedict leads product management for Pinterest’s cloud and data infrastructure. Previously, Michale led products for Twitter Cloud Platform, building next-generation compute services that span internal and public clouds. He and his team built Kite, a service lifecycle manager and an infrastructure metering and chargeback system. Prior to that, he was an engineer building systems that powered Twitter’s observability and monitoring stack. Micheal holds a master’s degree in computer science from the State University of New York at Buffalo.

Presentations

Managing the microservices lifecycle: The what, why, and how Session

Companies such as Twitter, Pinterest, Uber are powered by thousands of microservices. Managing the lifecycle of services (i.e., creating them, provisioning resources, deploying, metering, charging, and deprecating) at scale proves to be challenging. Micheal Benedict discusses the need for a lifecycle manager, how to implement governance, and the impact of such a system on developer productivity.

Marcus Blankenshipis an author, trainer, and consultant who helps companies improve their software delivery teams and processes. Fifteen years ago, he made the leap from a senior programmer/architect designing product configuration expert systems to leading teams and departments, and he has done so at global enterprises and his own software consultancy. Marcus has worked extensively as a consultant and trainer with manufacturing, digital agencies, and SaaS companies. Marcus is also the author of 7 Habits That Ruin Your Technical Team.

Presentations

Technology leadership: Building better people ops 2-Day Training

Engineers who become managers are experts at the technical aspects of their job, but they are often unprepared for the human and political challenges they face. Marcus Blankenship teaches engineering leaders a framework for building strong relationships with their teams, creating a driven culture, and communicating upward and outward to benefit their teams.

Aaron Blohowiak is a senior software engineer on the Chaos and Traffic team at Netflix. Aaron has a decade of experience taking down production, learning from mistakes, and striving to build ever more resilient systems.

Presentations

Precision chaos Session

Chaos Monkey and Kong changed the culture around infrastructure failure, but the most common cause of downtime is service failure. Turning off an entire service in production is too risky. Aaron Blohowiak offers an overview of precision chaos techniques that verify service-level fault tolerance and reveal hidden resource constraints while minimizing potential fallout.

Juan Pablo Buriticá is the vice president of engineering at Splice, where he leads a distributed team throughout the US and Latin America building a cloud platform for music creation, collaboration, and sharing. Juan Pablo has built effective software engineering organizations by emphasizing open source software values, technical excellence, trust, and empathy. He has organized five global software engineering conferences, spoken at multiple events, and founded and led the growth of Colombia’s JavaScript community, the largest Spanish-speaking JS community in the world, with more than 5,000 members.

Presentations

Technical decision making for teams, the open source way Session

Juan Pablo Buriticá explains how to use technical RFCs as a decision-making tool in your engineering organization to increase effectiveness. When implemented properly, technical RFCs can encourage trust and delegation, respectful discussions, knowledge sharing, and accountability and support good software design.

Brendan Burns is a partner architect at Microsoft Azure, where he runs the Container Service and Resource Manager teams, and a cofounder of the Kubernetes open source project. Previously, he worked at Google on cloud APIs and web search infrastructure and was a professor of computer science at Union College. Brendan holds a PhD in computer science from the University of Massachusetts Amherst and a BA in computer science and studio art from Williams College.

Presentations

Democratizing distributed systems: Building reusable distributed system patterns using containers Session

Building reliable distributed systems is challenging and often bespoke, so it's hard for developers to share implementations and best practices. Brendan Burns explores common patterns for composing reliable distributed systems and shows how these patterns can be expressed, via containers, so that they can be reused throughout many different applications.

Tammy Butow is a site reliability engineering manager at Dropbox, where she is the team lead for the Databases & Magic Pocket SRE teams. She enjoys working on infrastructure engineering and is interested in chaos engineering, antifragile systems, automation, Go, and Linux. Previously, Tammy worked in security engineering and product engineering. She is the cofounder of Girl Geek Academy, a global movement to teach 1 million women technical skills by 2025. Girl Geek Academy received support from the Australian prime minister and a grant from the Australian government in 2016 to scale the Miss Makes Code program, which is aimed at teaching algorithms to 5- to 8-year-old girls. An Australian, Tammy currently lives in San Francisco, where she likes to ride bikes, skateboard, snowboard, and surf. She also loves mosh pits, crowd surfing, metal, and hardcore punk.

Presentations

Chaos engineering bootcamp Tutorial

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow leads a hands-on tutorial on chaos engineering, covering the tools and practices you need to implement chaos engineering in your organization.

Lee Calcote is the senior director of technology strategy at SolarWinds. Lee is an innovative thought leader who is passionate about developer platforms and management software for clouds, containers, infrastructure, and applications. Advanced and emerging technologies have been a consistent focus through Calcote’s tenure at SolarWinds, Seagate, Cisco, and Pelco. Lee is active in the tech community and is an organizer of technology meetups and conferences, a writer, author, and speaker.

Presentations

The over-under on container networking Session

With application developers busily adopting container technologies, the time has come for network engineers to prepare for the unique challenges brought on by networking cloud-native applications. Lee Calcote walks you through available container connectivity options, explaining their function and when they should be used and comparing their performance characteristics.

Jack Chan is a senior engineering manager in Shutterfly’s Photos group. He was recently involved heavily with helping the company with a hybrid cloud migration solution with photos-related API services on AWS paired with a set of core services in a private data center. Jack has been working in software engineering development for quite some time, helping startups scale up to millions of users with cloud solutions. Previously, he worked in IT organizations at Adobe, Apple, and 3Com.

Presentations

How Shutterfly migrated 10+ billion photos to the cloud Session

Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos metadata on AWS.

Colin Charles is the chief evangelist at Percona. Previously, Colin was on the founding team of MariaDB Server, worked at MySQL, and worked actively on the Fedora and OpenOffice.org projects. Colin has been a MySQL user since 2000. He’s well known within open source communities in APAC and has spoken at many conferences.

Presentations

Best practices for MySQL high availability Tutorial

The MySQL world is full of trade-offs; choosing a high-availability (HA) solution is no exception, but only with high availability can you achieve distributed systems in your database layer. Colin Charles explores the MySQL high availability landscape, offering deep dives into current technologies, recommendations, and what to look out for.

Pete Cheslock is the head of Threat Stack’s operations and support teams, where he focuses on delivering the highest level of service, reliability, and customer satisfaction to Threat Stack’s growing user base. An industry veteran with over 15 years’ experience in operations, Pete understands the challenges and issues faced by security, development, and operations professionals every day. Previously, Pete held senior positions at Dyn and Sonian, where he built, managed, and developed automation and release engineering teams and projects.

Presentations

Scale it to a billion: How to build it, keep it safe, and keep it running Session

Pete Cheslock shares the operational and security practices that helped Threat Stack scale while staying stable and secure, covering technology and tools and the various scale points that forced hard decisions.

Armon Dadgar is the CTO of HashiCorp, where he brings distributed systems into the world of DevOps tooling. Armon has a passion for distributed systems and their application to real-world problems. He has worked on Nomad, Vault, Terraform, Consul, and Serf at HashiCorp and maintains the Statsite and Bloomd OSS projects.

Presentations

Nomad and next-generation application architectures Session

Armon Dadgar offers an overview of Nomad, an application scheduler designed for both long-running services and batch jobs. Along the way, Armon explores the benefits of using schedulers for empowering developers and increasing resource utilization and how schedulers enable new next-generation application architectures.

Tom Daly is vice president of infrastructure at Fastly. Previously, Tom cofounded Dyn Inc. and served as its president, chief technology officer, and chief scientist. Tom holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA from Bentley University.

Presentations

Incident Command: The far side of the edge Session

Fastly operates the edge for many large web properties. To deal with emerging threats to its network, Fastly created a process that allows it to respond effectively to incidents: Incident Command, which rapidly coordinates teams during an incident. Maarten Van Horenbeeck, Lisa Phillips, and Tom Daly take you to the far side of the edge, demonstrating the protocols that work during an incident.

Simon de Haan is the chief engineer at Praekelt.org. Simon has the rare talent to demystify software systems and platforms for nonengineers. Previously, he was the team lead on Praekelt.org’s Vumi platform, an open source messaging platform that allows for interactive conversations over SMS, USSD, Google Talk, and other basic technologies at low cost and at population scale in the majority world. Vumi is the technology that powers various groundbreaking initiatives such as Wikipedia Text, PeaceTXT, MomConnect, MAMA, and the Libyan election registrations. Prior to joining Praekelt, Simon was CTO at Soocial.com, a senior developer at Eight.nl, and the owner of Fission.nl. Simon has hosted various talks, webinars, and hackathons about his passion for development and for building systems that can scale and can help our partners reach their audiences with life-s​aving information. Growing up in the Middle East ruined him for the ordinary as those years left an unmistakable impression and set him on a direct course to be involved in community development, entrepreneurship, and technology.

Presentations

No place like home: Building resilient distributed systems locally in Africa Session

Developing reliable healthcare systems requires careful integration of a country’s health, tech, and legal ecosystems. In Africa, locally built resilient distributed systems are needed to meet the demand of national-scale digital health services and data sovereignty laws. Simon de Haan explores the challenges and proven solutions building in these environments.

Bart De Vylder is a data scientist at CoScale. Previously, Bart was active in software engineering and architecture, with a focus on distributed systems. His interests lie in machine learning and building reliable, scalable data processing systems. Bart holds a PhD in artificial intelligence from the Free University of Brussels.

Presentations

A hands-on data science crash course for modeling and predicting the behavior of (large) distributed systems Tutorial

Data science is a hot topic. Bart De Vylder offers a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine-learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.

Jason DuMars is director of operations at Rally Software, where he is leading the effort to create and implement a truly Agile operations group and helping define an enterprise-scaled version of DevOps that can be leveraged by other organizations. Jason has divided his career between being a sysadmin and managing technology groups, but his highest aspiration is to be a servant leader for his tribe. He has been a *nix systems administrator since 1993, and long before that held the title of SysOp on various BBS systems. In the fourth grade, he hacked the source code of Oregon Trail on an Apple II Plus. (His classmates may have “died of poo” instead of “starvation” as originally intended by the game’s authors.) Jason firmly believes that operations engineers of all stripes are the glue that holds everything together. His heart will always belong to the scruffy sysadmins, DBAs, analytics wranglers, network engineers, and security teams—a group of wildly diverse technologists whose days, nights, and sometimes weekends are abruptly interrupted by text messages reading “HOST UNREACHABLE” or “HIGH LOAD WARNING.”

Presentations

Real-world Kubernetes 2-Day Training

Kubernetes has emerged as the leading platform for containerized applications. Lachlan Evenson and Jason DuMars offer a deep dive into Kubernetes, from concept to implementation, sharing detailed explanations of its architecture, security, and use cases.

Dinesh Dutt is chief scientist at Cumulus Networks. Dinesh has been in the networking industry for 15 years, most of it spent at Cisco Systems, where he was involved in enterprise and data center networking technologies, including the design of many of the ASICs that powered Cisco’s megaswitches, such as Cat6K and the Nexus family of switches. He also has experience in storage networking from his days at Andiamo Systems and in the design of FCoE. Dinesh is a coauthor of TRILL and VxLAN and has filed for over 40 patents.

Presentations

Troubleshooting data center networks: Fresh tools and perspectives Tutorial

Dinesh Dutt explores network troubleshooting and explains how to avoid common network problems, ranging from misconfigured cabling to misbehaving protocols, how a modern networking tool chest can help simplify network configurations, and how automation is improving troubleshooting turnaround times to minimize downtime.

Devin Elliot is the founder of Unoceros. Previously, Devin worked in weather data tech, designed flavor molecules from bacteria, painted houses, and was a professional snowboarder.

Presentations

Edge infrastructure will save you from your mobile traffic nightmares Session

It takes more than a one-tenth scale server-based test environment to seamlessly load balance and deliver content to millions of mobile users. Devin Elliot explains how UX for customers of major media and live streaming events was improved by leveraging idle distributed networks of smartphones and smart devices to repeatedly map, measure, and load test at scale.

Lachlan Evenson is lead operations engineer at Deis. An evangelist passionate about open source projects and communities, Lachlan has presented at a number of conferences and events, including OpenStack Summit in Tokyo and Vancouver 2015, OpenStackSV 2015, KubeCon, Kubernetes v1 Launch, and various community events. He is a co-organizer of 900+ meetup groups in the Bay Area and a current member on the OpenContrail Advisory Board.

Presentations

Real-world Kubernetes 2-Day Training

Kubernetes has emerged as the leading platform for containerized applications. Lachlan Evenson and Jason DuMars offer a deep dive into Kubernetes, from concept to implementation, sharing detailed explanations of its architecture, security, and use cases.

Bret Fisher is a Virginia Beach-based freelance DevOps and Docker consultant, trainer, speaker, and open source volunteer. Bret has been a cloud and data center ops and system administrator for 20 years. Currently, he helps teams Dockerize their apps and systems and improve their speed of deployment, resiliency, metrics, and awareness (all that DevOps-y stuff). Bret is a Docker Captain and Code for America Brigade Captain. He runs several monthly meetups, speaks at conferences, and is obsessed with containerizing any app he sees.) He’ll likely talk your ear off about it next time you meet.) Bret also develops in Node.js, Bash, and general web, usually for open source projects. In his freetime, he does CrossFit, surfs a little, geeks out in the awesome local dev community in Virginia Beach, and travels with his wife.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Nicole Forsgren is the CEO and chief scientist at DevOps Research and Assessment (DORA). Nicole is an IT impacts expert who is best known for her work with tech professionals and as the lead investigator on the largest DevOps studies to date. She is a consultant, expert, and researcher in knowledge management, IT adoption and impacts, and DevOps. In a previous life, she was a professor, sysadmin, and hardware performance analyst. Nicole has been awarded public and private research grants (funders include NASA and the NSF), and her work has been featured in various media outlets, peer-reviewed journals, and conferences. She holds a PhD in management information systems and a master’s degree in accounting.

Presentations

Are we there yet? Signposts on your journey to awesome Session

When embarking on a journey of transformation, you want to measure your current status and subsequent progress while keeping tabs on factors that drive improvement in technology performance. Nicole Forsgren explains the importance of knowing how (and what) to measure—ensuring you catch successes and failures when they first show up, not just when they’re epic.

Camille Fournier is the former head of engineering at Rent the Runway. She was previously a vice president at Goldman Sachs. Camille is an Apache ZooKeeper committer and PMC member and a Dropwizard framework PMC member.

Presentations

The role of being technical in technical leadership Keynote

What does it mean to be a technical leader? There is compelling evidence that technical workers want leaders who are strong technologists, leaders they believe they can learn from.

Susan J. Fowler is the author of Production-Ready Microservices. She is currently an engineer at Stripe. Previously, Susan worked on microservice standardization at Uber, developed application platforms and infrastructure at several small startups, and studied particle physics at the University of Pennsylvania.

Presentations

Keynote by Susan Fowler Keynote

Details to come.

Laura Frank is a Docker Captain and the director of engineering at Codeship, where she works on improving the Docker infrastructure and overall experience for all users of the CI/CD platform. Previously, she worked on several open source projects to support Docker in the early stages of the project, including Panamax and ImageLayers. Laura lives in Berlin, where she can be found eating döner or attempting to try every type of gin in the world.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Everything you thought you already knew about orchestration Session

Do you understand how quorum, consensus, leader election, and different scheduling algorithms can impact your running application? Could you explain these concepts to the rest of your team? Laura Frank explores the algorithms that power all modern container orchestration platforms and shares actionable steps to keep your highly available services highly available.

Christopher Fulton is a global technical account manager at Electric Cloud, where he leads professional services efforts for large FinServe customers and helps implement ElectricFlow for Electric Cloud’s largest customers. Previously, he spent 10 years as a build and release manager, taking week-long processes and automating them to hours. Chris holds a degree in computer science and religion.

Presentations

CD for DBs: Database deployment strategies Session

Chris Fulton shares strategies for database deployments and rollbacks as well as some patterns and best practices for reliably deploying databases as part of your CD pipeline, safely rolling back database code, ensuring data integrity, and more.

Evan Gilman is a site reliability engineer at PagerDuty. With roots in academia, Evan finds passion in both reliable, performant systems, and the networks they run on. When he’s not building automated systems for PagerDuty, he can be found at the nearest pinball table or working on his upcoming book, Zero Trust Networks.

Presentations

Zero Trust networks: Building systems in untrusted networks Session

Douglas Barth and Evan Gilman offer an overview of Zero Trust, a new security model that considers all parts of the network to be equally untrusted. Doug and Evan show how to leverage a network's strengths by combining traditional SRE security approaches with novel technological arrangements while using software and hardware to secure the systems operating in those networks.

Sebastien Goasguen is the founder of Skippbox, a Kubernetes startup that develops open source tools for Kubernetes users. Sebastien is a 20-year open source veteran. A member of the Apache Software Foundation, he worked on Apache CloudStack and Libcloud for several years before diving into the container world. He is an avid blogger and enjoys spreading the word about new cutting-edge technologies. He also trains developers and sysadmins on all things Docker and Kubernetes. Sebastien is the author of the O’Reilly Docker Cookbook and 60 Recipes for Apache CloudStack.

Presentations

Scheduling containers with Kubernetes: Is it that different than other schedulers? Session

Kubernetes has emerged as one of the leading container orchestrators. Sebastien Goasguen explores its architecture and compares it with other orchestration/scheduling systems, outlining the similarities and explaining why Kubernetes API primitives make all the difference.

Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha’s consulting work revolves mainly around distributed architecture, production debugging, and mobile application development. Sasha is the author of Introducing Windows 7 for Developers (Microsoft Press, 2009) and Pro .NET Performance (Apress, 2012). He is also a prolific blogger and the author of numerous training courses, including .NET Debugging, .NET Performance, Android Application Development, and Modern C++.

Presentations

Linux performance monitoring with BPF Tutorial

Sasha Goldshtein leads an hands-on workshop on Linux dynamic tracing. You'll explore the BPF Compiler Collection (BCC), a set of tools and libraries for dynamic tracing, and gain firsthand experience of memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the baked-in tools for file I/O, network, and CPU analysis.

Oliver Gould is the CTO of Buoyant, where he leads open source development efforts. Previously, he was a staff infrastructure engineer at Twitter, where he was the tech lead of the Observability, Traffic, and Configuration and Coordination teams. Oliver is the creator of linkerd and a core contributor to Finagle, the high-volume RPC library used at Twitter, Pinterest, SoundCloud, and many other companies.

Presentations

The service mesh: distributed resilience for a cloud native world Session

Modern application architecture is shifting to the cloud native: containerized, microservice-d, and orchestrated. But resilience is more than just Docker and Kubernetes. In this talk, we look at why companies like Paypal, Ticketmaster, and Monzo are adopting the "service mesh" model, where internal, service-to-service traffic is managed and instrumented with a mesh of load-balancing proxies.

Julia Grace is the director of infrastructure engineering at Slack. Previously, she was cofounder and CTO of Tindie, a marketplace for electronics funded by Andreessen Horowitz, where she built out and led the engineering team from founding through acquisition. Prior joining the startup world, she spent several years building systems at IBM Research. Julia holds a BS and MS in computer science from the University of North Carolina at Chapel Hill with a focus on distributed systems. She is an avid runner and once starred in a TV commercial.

Presentations

10,000 messages a minute: Lessons learned from building engineering teams under pressure Session

Julia Grace has built teams at IBM Research, startups, and Slack and has done due diligence for venture capitalists to determine how well a startup’s engineering team is working together. Drawing on this knowledge, Julia attempts to answer the question, Why do some teams ship features rapidly, support each other, and effectively communicate while others struggle?

Brendan Gregg is a senior performance architect at Netflix, where he does large-scale computer performance design, evaluation, analysis, and tuning.Previously, Brendan worked as a performance and kernel engineer. He has created performance analysis tools included in multiple operating systems, as well as visualizations and methodologies. Brendan is the author of Systems Performance. He received the USENIX LISA Award for outstanding achievement in system administration.

Presentations

Performance analysis superpowers with Linux eBPF Session

Advanced performance observability and debugging has arrived in Linux 4.x, with enhanced BPF (eBPF). Brendan Gregg offers an overview of Linux's new dynamic and static tracing tools for the analysis of filesystems, storage, CPUs, TCP, and more. Join in to explore a new generation of tools and visualizations.

Timothy Gross is a product manager for Joyent, providers of the Triton Elastic Container Service. Previously, Tim ran ops at DramaFever, where he and his scrappy team ran Docker in production to serve a few million fans their daily dose of dramas, documentaries, and gross-out horror movies. In another life, Tim was an architect (buildings, not software). He took the leap into programming and operations after he discovered he could automate away almost everything boring in his life.

Presentations

Software-defined culture Session

Conway's law tells us that "organizations which design systems. . .are constrained to produce designs which are copies of the communication structures of these organizations." What if we turn Conway's law around? Timothy Gross explores how to make technology choices that improve the culture of your organization.

Kelsey Hightower has worn every hat possible throughout his career in tech but most enjoys leadership roles focused on making things happen and shipping software. Kelsey is a strong open source advocate focused on building simple tools that make people smile. When he is not slinging Go code, you can catch him giving technical workshops covering everything from programming and system administration, to his favorite Linux distro of the month (CoreOS).

Presentations

Keynote by Kelsey Hightower Keynote

Details to come.

Jeff Holoman is a systems engineer at Cloudera. Jeff is a Kafka contributor and has focused on helping customers with large-scale Hadoop deployments, primarily in financial services. Prior to his time at Cloudera, Jeff worked as an application developer, system administrator, and Oracle technology specialist.

Presentations

When it absolutely, positively, has to be there: Reliability guarantees in Kafka Session

Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? Gwen Shapira and Jeff Holoman walk you through everything that happens to a message, from producer to consumer, and pinpoint all the places where data can be lost if you're not careful.

Sneha Inguva is an enthusiastic software engineer working on building developer tooling at DigitalOcean. Previously, Sneha worked at a number of startups. Her experience across an eclectic range of verticals, from education to 3D printing to casinos, has given her a unique perspective on building and deploying software. When she isn’t bashing away on a project or reading about the latest emerging technology, Sneha is busy molding the minds of young STEM enthusiasts in local NYC schools.

Presentations

Observability in a dynamically scheduled world Session

Over the past year, DigitalOcean's Delivery team has been building a runtime platform based on Kubernetes with the goal of making shipping code easier. A core component of this system is a monitoring and alerting system based on Prometheus and Alertmanager. Sneha Inguva offers an overview of the system and shares problems encountered, potential solutions, and key lessons learned in the process.

Karl Isenberg is a distributed systems architect at Mesosphere working on DC/OS (the Datacenter Operating System). Prior to Mesosphere, Karl worked on CloudFoundry and BOSH at Pivotal. Karl’s current side projects include Probe (a service-ready check), Inject (a Golang dependency injection library), and Mesos Compose Docker-in-Docker. Karl is, as of this writing, the only person to have been a committer on CloudFoundry, Kubernetes, and DC/OS, so he is uniquely qualified to address the container platform market, cloud-native frameworks, life-cycle management strategies, and deployment tools in general. Karl’s publications include Obfuscation, an irregularly updated tech blog, and a more active stream of technology-related tweets.

Presentations

Container orchestration wars Session

The orchestration space is fast moving and full of competing products, platforms, and frameworks. How do you choose the right one for your requirements? Karl Isenberg explores the features of several container orchestrators, breaking down the feature sets and characteristics into categories and scoring multiple solutions against each other, and discusses what's new this year.

Samir Jafferali is a staff SRE at LinkedIn. Samir is passionate about everything that makes the internet tick.

Presentations

Orchestrating multihomed cloud services for a fast and resilient edge Session

With members in every corner of the world, LinkedIn has built services around six CDNs, numerous PoPs, and three DNS platforms. Samir Jafferali explains how LinkedIn uses big data to steer DNS intelligently, optimizes the CDNs for performance, mitigates DDoSes, and measures metrics using RUM and synthetic monitoring and shares best practices that edge teams of all sizes can benefit from.

Dan Jones is the CTO and cofounder at VictorOps, where he supports the company’s goal of making on-call suck less. He is intimately familiar with what it takes to keep a business running when the slightest outage means lost revenue and unhappy customers. With almost 30 years in the software industry, Dan has spent the last 20 years architecting and building scalable 24/7 internet services designed to be “always on.” Previously, Dan was chief architect and vice president of engineering at two successful startups, Raindance Communications and Lijit Networks.

Presentations

The move to event sourcing and CQRS in distributed systems Session

Dan Jones discusses VictorOps's transition to event sourcing and CQRS in distributed systems. Through the use of persistent actors, VictorOps was able to redesign, rebuild, and deploy the entire underlying infrastructure without any noticeable impact to end users.

Nora Jones is a senior chaos engineer at Netflix. Nora is passionate about delivering high-quality software, improving processes, and promoting efficiency within architecture. Occasionally, she pokes holes in distributed systems to make them more resilient.

Presentations

The road to chaos Session

Chaos engineering isn't always the most popular practice among your developers. Nora Jones covers the specifics of designing a chaos engineering solution and explains how to increment your solution technically and culturally, the socialization and evangelism pieces that tend to get overlooked in the process, and how to get developers excited about purposefully injected failure.

Dharmesh Kakadia is a developer and a researcher at Microsoft, where he works on distributed systems. Dharmesh is the author Apache Mesos Essentials. He is passionate about open source and likes to work at the intersection of data and cloud. He enjoys reading in his free time.

Presentations

Scheduling deep dive for orchestration systems Session

Orchestration systems all have different design trade-offs. Despite best efforts, these choices are not always clear to developers using these systems. Dharmesh Kakadia describes the fundamentals of scheduling and explores the scheduling algorithms implemented by various orchestration systems, highlighting similarities, differences, and the consequences of the design choices for the users.

Suman Karumuri is the lead for distributed tracing at Pinterest. Previously, he served as the lead for Zipkin project at Twitter. He is the author of an upcoming book Distributed Tracing from O’Reilly.

Presentations

PinTrace: A distributed tracing pipeline Session

Distributed tracing is an emerging field of monitoring distributed systems. Suman Karumuri shares the challenges of building and deploying distributed tracing at scale using PinTrace, one of the largest distributed tracing pipelines. Drawing on real-world examples, Suman explains how traces can be used to understand, debug, and optimize your production workflows.

Ann Kilzer is a site reliability engineer at Indeed. Previously, she worked in backend development and privacy research. Ann holds a master’s degree in computer science from the University of Texas. She enjoys textile arts and trains with the local circus.

Presentations

Canary in a coal mine: Building infrastructure resiliency with canary data reloads Session

Remember the old practice of the canary in the coal mine, where miners used fragile feathered friends as a failure detector for toxic gasses? In software, a canary run is a trial executed on one machine before the rest of the cluster runs. Ann Kilzer explains how Indeed created a canary service leveraging Consul’s key value store to improve the resilience of data reloads in any infrastructure.

Matt Klein is a software engineer at Lyft and the architect of Envoy. Matt has been working on operating systems, virtualization, distributed systems, and networking and making systems easy to operate for 15 years across a variety of companies. Some highlights include leading the development of Twitter’s C++ L7 edge proxy and working on high-performance computing and networking in Amazon’s EC2.

Presentations

Lyft's Envoy: Experiences operating a large service mesh Session

Over the past several years, Lyft has migrated from a monolith to a sophisticated "service mesh" powered by Envoy. Matt Klein explains why Lyft developed Envoy, focusing primarily on the operational agility that the burgeoning service mesh SoA paradigm provides, and shares lessons learned along the way.

Justin Li is a production engineer at Shopify, where he works on performance, parsers, and distributed systems. To unwind after making the computers go fast, he attempts to make the office karts go fast instead.

Presentations

Standing on the shoulders of giants: Unleashing the power of scriptable load balancers Session

Once reserved for companies large enough to write a load balancer from scratch, load balancer middleware can be a powerful tool for scaling applications. Emil Stolarsky and Justin Li explain how Shopify uses scriptable load balancers to solve difficult infrastructure problems, such as sharding across data centers, handling flash sales, and responding quickly to DDoS attacks.

Bryan Liles works on the Cloud Engineering team at Capital One. When not helping a huge back move to the public cloud, he gets to speak at conferences on topics ranging from machine learning to building the next generation of developers. In his free time, Bryan races cars in straight lines and around turns and builds robots and devices.

Presentations

Application tracing tutorial Tutorial

In the past, applications were monolithic, and tracing flows for performance and bottlenecks was straightforward, as there was likely a single code base. In today's world, with multiple processes constituting a single application, tracing becomes more challenging. Bryan Liles offers a hands-on demonstration for implementing tracing in modern applications.

Tyler McMullen is CTO of Fastly, where he is responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s instant purging system, API, and real-time analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, Tyler has experience in everything from web design to kernel development and loathes all of it. Especially distributed systems.

Presentations

Building a skyscraper with Legos: The anatomy of a distributed system Session

The practical realities of distributed systems are rarely straightforward. Tyler McMullen walks you through a system built to perform very high volumes of health checks, done across a cluster of machines for reliability and scalability. Tyler discusses each of the major components in turn to show how they are practically built and the pain and compromises that they bring.

Sangeeta Narayanan leads the Edge Developer Experience team at Netflix, which focuses on creating solutions that increase development velocity and provide operational insight into system health and behavior. Sangeeta has held various roles in her career in fields such as test engineering, sales engineering, and engineering management. Throughout all those experiences, the common theme has been her passion for simplifying the process of developing and operating software.

Presentations

Lessons learned from operating a serverless-like platform at scale Session

Netflix operates a customizable API that allows the creation of optimized experiences on a 1,000+ devices by providing developers a serverless-like platform and experience. Sangeeta Narayanan shares lessons learned operating and scaling the platform over the years and Netflix's approaches to some of the challenges it faced.

Courtney Nash chairs multiple conferences for O’Reilly Media and is the strategic content director focused on areas of modern web operations, high-performance applications, and security. An erstwhile academic neuroscientist, she is still fascinated by the brain and how it informs our interactions with and expectations of technology. She’s spent 17 years working in the technology industry in a wide variety of roles, ever since moving to Seattle to work at a burgeoning online bookstore. Outside work, Courtney can be found biking, hiking, skiing, and photographing the Cascade Mountains near her home in Bellingham, Washington.

Presentations

Thursday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull, and Ines Sombra open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull and Ines Sombra open the first day of keynotes.

Lisa Phillips is vice president of site reliability engineering at Fastly. With 18 years of experience in Internet and Web technologies with emphasis on systems and database administration, architecture, engineering, and management, Lisa isn’t afraid of hard problems or scale. She brings extensive experience in implementation and management of Internet services to ensure highest levels of system availability and performance globally.

Presentations

Incident Command: The far side of the edge Session

Fastly operates the edge for many large web properties. To deal with emerging threats to its network, Fastly created a process that allows it to respond effectively to incidents: Incident Command, which rapidly coordinates teams during an incident. Maarten Van Horenbeeck, Lisa Phillips, and Tom Daly take you to the far side of the edge, demonstrating the protocols that work during an incident.

Tony Pujals is a Docker Captain and the director of cloud engineering at Appcelerator, where he focuses on improving the process of building, deploying, orchestrating, and monitoring containerized microservices. Tony is fanatical about Docker, Go, Node.js, APIs, microservices, serverless computing, distributed systems, and scalable cloud architecture. He is a co-organizer of the Mountain View Docker meetup.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

David Radcliffe is a production engineer lead at Shopify. He moonlights on the Ops team for RubyGems.org and is active in the open source community.

Presentations

Genesis: Automating data center management with help from PXE and Chef Session

The flexibility and speed offered by cloud computing solutions have raised the bar for bare metal deployments. Automation is essential to speedy, reliable provisioning and capacity management. David Radcliffe explores the tools Shopify uses, such as Genesis, to automate its data center and empower developers to move quickly and keep up with the times.

I’ve spent my more than twenty years of experience in technology in a wide variety of disciplines, from UNIX systems engineering to network architecture and implementation, software development and software testing and release management. I’ve got a passion for leading highly effective, highly-motivated teams, and the technical background and acumen to pitch in and be hands-on. I code for fun. I’m a master of no trade, but have a keen holistic vision for how people and technologies work together most effectively.

I’m currently managing the Insight Engineering organization at Netflix, where we write the powerful telemetry platform and graphics, alerting, and analytics systems on top of it, that allow Netflix to have complete real-time visibility into its operations and systems — In the cloud, on customer devices, and anywhere else where Netflix operates. At more than a billion data points per minute, it’s where big data hits real time visibility, and data is refined to information, insight, and operational intelligence.

Presentations

From placid planners to passionate pioneers: In pursuit of the next thing Session

When you're a scrappy startup, being nimble, agile, and flexible comes with the territory. But how do you maintain agility when you're a much, much, larger company? Hope is not lost. Roy Rapoport shares critical leadership practices—focusing on encouraging failure, growing heretics, and empowering dissent—that will help you maintain a technical and organizational edge.

Henry Robinson is a software engineer at Cloudera, where he works on a variety of distributed systems, including Apache Zookeeper and Apache Impala.

Presentations

How to scale a distributed system Session

It seems like everyone is building a distributed system. However, there's no common body of knowledge about how these systems should be built and scaled, beyond what is squirreled away in various academic papers. Henry Robinson shares lessons learned from over eight years spent building distributed systems and outlines a framework for thinking about distributed scaling challenges.

John Sasser is an AWS Certified Solutions Architect—Professional with more than 20 years’ experience designing, deploying, and managing mission-critical systems in the cloud for a variety of organizations, including Apple, EMC, the DOD, and the DOJ.

Presentations

Building resilient systems on AWS 2-Day Training

John Sasser shares best practices for designing and deploying resilient, fault-tolerant systems on AWS and offers deep dives into managed versus unmanaged services, monitoring and observability, high-availability design patterns, fault-tolerant and self-healing systems, disaster recovery and business continuity approaches, and DDoS mitigation.

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Presentations

When it absolutely, positively, has to be there: Reliability guarantees in Kafka Session

Kafka provides the low latency, high throughput, high availability, and scale that financial services firms require. But can it also provide complete reliability? Gwen Shapira and Jeff Holoman walk you through everything that happens to a message, from producer to consumer, and pinpoint all the places where data can be lost if you're not careful.

Adam Shepard is a senior software architect at AudienceScience.

Presentations

Scaling a user delivery network for real-time audience targeting Session

Adam Shepard peels back the covers on a user delivery network—a worldwide distributed data store powering over 80 billion transactions a day at millisecond speed. Join in to learn about eventually consistent data architectures, tiered and hybrid storage layers, and what it takes to manage that much data at scale.

Ben Sigelman is the cofounder and CEO of LightStep, where he’s building reliability management for modern systems. An expert in distributed tracing, Ben is the coauthor of the OpenTracing standard, a project within the Linux Foundation’s Cloud Native Computing Foundation (CNCF). Previously, he built Dapper, Google’s production distributed systems tracing infrastructure, and Monarch, Google’s fleet-wide time series collection, storage, analysis, and alerting system. Ben holds a BSc in mathematics and computer science from Brown University.

Presentations

The holy grail of systems analysis: From what to where to why Session

Most sudden latency regressions in a distributed system are throughput or queueing problems. Now that some monitoring technologies can observe a system with full fidelity, we can connect the dots from a high-latency outlier request to the contended resource it’s waiting on. Ben Sigelman explains why this workflow could change the way we understand critical-path latency in distributed systems.

Ines Sombra is a Director of engineering at Fastly, where she spends her time helping the Web go faster. Ines holds an MS in computology with an emphasis on cheesy ’80s rock ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life, Ines was a data engineer.

Presentations

Thursday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull, and Ines Sombra open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull and Ines Sombra open the first day of keynotes.

Emil Stolarsky is a production engineer at Shopify, where he works on performance, scriptable load balancers, and DNS tooling. When he’s not trying to make Shopify’s global performance heat map green, he’s shivering over a spiked cup of coffee in the great Canadian north.

Presentations

Standing on the shoulders of giants: Unleashing the power of scriptable load balancers Session

Once reserved for companies large enough to write a load balancer from scratch, load balancer middleware can be a powerful tool for scaling applications. Emil Stolarsky and Justin Li explain how Shopify uses scriptable load balancers to solve difficult infrastructure problems, such as sharding across data centers, handling flash sales, and responding quickly to DDoS attacks.

James Turnbull is the CTO of Empatico. A long-time member of the open source community, James is the author of nine technical books about open source software: The Terraform Book, The Art of Monitoring, The Logstash Book, The Docker Book, Pro Puppet, Pulling Strings with Puppet, Pro Linux System Administration, Pro Nagios 2.0, and Hardening Linux. He was formerly CTO at Kickstarter and an advisor at Docker. James likes food, wine, books, photography, and cats. He is not overly keen on long walks on the beach and holding hands.

Presentations

Thursday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull, and Ines Sombra open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs, Courtney Nash, James Turnbull and Ines Sombra open the first day of keynotes.

Lisa van Gelder is senior vice president of technology at Bauer Xcel Media. Lisa has been writing software for over 17 years, ever since she started building websites in 1999, but she soon discovered that backend problems were way more fun. Her career has taken her from small startups to large media organizations between London and New York, including the Guardian newspaper and the BBC. She is mostly powered by coffee.

Presentations

A/B testing sexism: Interviewing as a female executive in tech Session

Lisa van Gelder shares what she learned from an accidental A/B test. (This year, she interviewed for a new executive job at the same time as two (white, male) friends, and they compared notes.) Lisa explains how "unqualified" is used to reject marginalized groups in tech and what we can do about it—both as individuals interviewing and as hiring managers looking to improve the interview process.

Maarten Van Horenbeeck is vice president of security engineering at Fastly, a content delivery network that speeds up web properties around the world. He is also a board member and former chairman of the Forum of Incident Response and Security Teams (FIRST), the largest association of security teams, counting 300 members in over 70 countries. Previously, Maarten managed the Threat Intelligence team at Amazon and worked on the Security teams at Google and Microsoft. Maarten holds a master’s degree in information security from Edith Cowan University and a master’s degree in international relations from the Freie Universitat Berlin. When not working, he enjoys backpacking, sailing, and collecting first-edition travel literature.

Presentations

Incident Command: The far side of the edge Session

Fastly operates the edge for many large web properties. To deal with emerging threats to its network, Fastly created a process that allows it to respond effectively to incidents: Incident Command, which rapidly coordinates teams during an incident. Maarten Van Horenbeeck, Lisa Phillips, and Tom Daly take you to the far side of the edge, demonstrating the protocols that work during an incident.

Seth Vargo is the director of technical sdvocacy at HashiCorp. Previously, he worked at Chef (Opscode), CustomInk, and a few Pittsburgh-based startups. He is the author of Learning Chef. Seth is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth enjoys spending time with his friends and advising nonprofits. He loves all things bacon.

Presentations

Microservices secrets management with Vault Tutorial

It's great that you've moved to microservices, but how are you distributing secrets? Seth Vargo explains why Vault's unique approach to secret management by providing secrets as a service for your services (and humans too) makes it highly scalable and easily customizable to fit any environment.

Kathleen Vignos is a full stack engineer turned manager who has led engineering teams at Twitter and Wired. She’s worked at two startups (one of which she founded), traveled the western US for management consulting and professional services, taught business software programming at the university level, won a hackathon, and built dozens of websites. Other experiences include everything from being on call as a COBOL programmer for Y2K to modifying a React app for a hack week project. She holds engineering degrees from UCLA and Michigan.

Presentations

Managing engineering teams through constant change Session

Constant change—caused by high attrition, frequent reorganization, shifting priorities, and management turnover, among other reasons—is the new normal. It takes months to onboard a new team member and get them adding value. Kathleen Vignos offers tips, shortcuts, and stories for stabilizing a team and finding a path to productivity amid the chaos.

Miles Ward is global head of solutions for Google Cloud, where he focuses on everything from delivering next-generation solutions to challenges in big data and analytics, application migration, infrastructure automation, and cost optimization. Miles is a three-time technology startup entrepreneur with a decade of experience building cloud infrastructures. Previously, he was a core part of the Obama for America 2012 “tech” team, crashed Twitter a few times, helped NASA stream the Curiosity Mars Rover landing, and put Skype back online in a pinch. He also plays a mean electric sousaphone.

Presentations

Google Cloud Spanner: Global consistency at scale Session

Google Cloud Spanner, Google's public launch of the internal Spanner service, makes available a new basic primitive for application design: globally consistent transactions. Want to know how it all works? Join Miles Ward for a detailed, demo-filled, nuanced look at the useful applications of Spanner for your workload.

James Wickett is head of research at Signal Sciences, where he works at the intersection of the DevOps and security communities. James is a supporter of the Rugged Software and Rugged DevOps movements. Seeing the gap in software testing, James founded Gauntlt, an open source project, to serve as a Rugged testing framework. He is the author of Hands-on Gauntlt and DevOps Fundamentals on Lynda.com. James got his start in technology when he founded a startup as a student at University of Oklahoma. He has worked in environments ranging from large, web-scale enterprises to small, rapid-growth startups. He is a dynamic speaker on topics in DevOps, InfoSec, cloud security, security testing, Rugged DevOps, and serverless. James is the creator and founder of the Lonestar Application Security Conference, the largest annual security conference in Austin, TX. He also runs DevOps Days Austin and is on the global DevOps Days board. James holds several security certifications, including CISSP and GWAPT. In his spare time, he’s trying to learn how to make a perfect BBQ brisket.

Presentations

Serverless security: A pragmatic primer for builders and defenders Session

Serverless is the design pattern for writing applications at scale without the necessity of managing infrastructure. It adds simplicity and a new economic model to cloud computing, but it creates some unique security challenges. James Wickett explores practical security approaches for serverless in four key areas: the software supply chain, the delivery pipeline, data flow, and attack detection.

Christine Yen is the cofounder of Honeycomb, a startup with a new approach to observability and debugging systems with data. Christine has built systems and products at companies large and small and likes to have her fingers in as many pies as possible. Previously, she built Parse’s analytics product (and leveraged Facebook’s data systems to expand it) and wrote software at a few now-defunct startups.

Presentations

The problem with preaggregated metrics Session

Preaggregated metrics and time series form the backbone of many monitoring setups. They have many redeeming qualities but simply aren't sufficient for capturing or responding to the many ways things can go wrong in modern or complex systems. Christine Yen outlines the problems inherent in the use and implementation of preaggregated metrics.