San Jose • New York • London

Build & maintain complex distributed systems

October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference

New York, NY

Speaker slides

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

A hands-on data science crash course for modeling and predicting the behavior of (large) distributed systems

Bart De Vylder (CoScale), Pieter Buteneers (CoScale)

View slides

Data science is a hot topic. Bart De Vylder and Pieter Buteneers offer a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.

Blockchains and cryptocurrencies: New paradigms for shared data

Neha Narula (Digital Currency Initiative)

Watch the keynote

Bitcoin showed us a new way of moving value around the internet without intermediaries. Neha Narula explains how this paradigm might apply to our traditional ways of thinking about databases that cross organizational boundaries. As data on the web becomes consolidated around a few key players, the blockchain might help users gain more control.

Creating pipelines to build, test, and deploy containerized artifacts

Tom Adams (ThoughtWorks)

View slides

Containerization has launched a new wave of software deployment models, but do our philosophies for building, testing, and deploying software still hold true? Tom Adams walks you through creating a build pipeline for Docker images that is rooted in continuous integration (CI) practices.

Customer-centric observability

Mark McBride (Turbine Labs)

Download slides (PDF)

With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective.

Debugging complex systems

Terran Melconian (Air Network Simulation and Analysis)

Download slides (PDF)

Terran Melconian explores an organized process for observing a misbehaving complex system, reasoning about possible causes, and isolating the fault. While it is not generally taught, all the successful senior engineers with operational experience Terran has talked to use a variant of this process.

Developing resilient microservices with Kubernetes and Envoy

Phil Lombardi (Datawire), Rafael Schloming (Datawire), Richard Li (Datawire)

Download slides (PDF)

Microservices are an increasingly popular approach to building cloud-native applications, and dozens of new technologies that streamline adopting microservices development, such as Docker, Kubernetes, and Envoy, have been released over the past few years. Phil Lombardi, Rafael Schloming, and Richard Li walk you through actually using these technologies to develop, deploy, and run microservices.

ETW: Monitor anything, anytime, anywhere

Dina Goldshtein (Riverbed)

Download slides (PDF)

Event Tracing for Windows (ETW) is the most important diagnostic tool Windows developers have at their disposal. Dina Goldshtein explores the rich and wonderful world of ETW events, which span numerous OS components. You’ll learn how to diagnose complex issues in production systems and discover ways to automate ETW collection and analysis to build self-diagnosing applications.

Fly the airplane (sponsored by NS1)

Kristopher Beevers (NS1)

Watch the keynote

During active operational incidents, we experience very human reactions that get in the way of resolution. Approaches like Incident Command provide solid foundations for incident response. Kristopher Beevers explains how to augment Incident Command with simple tools and processes that help your team focus, communicate effectively, and respond calmly and precisely during mission-critical events.

Four things I wish I'd known sooner about persistent memory

Rob Dickinson (resurface.io)

Download slides (PDF)

On the surface, adapting software to use persistent memory seems obvious. After all, persistent memory is simply fast memory that maintains state when the power goes out, like an SSD. But unlike SSDs, persistent memory challenges long-held ideas and conventions about how software works. Rob Dickinson outlines four key ideas that will help focus your persistent memory strategy.

FPGAs in the cloud?

Julien Simon (AWS)

Download slides (PDF)

FPGAs have become a hot topic in the IT industry, thanks to the unprecedented computing power that they bring to demanding HPC applications, and AWS recently introduced FPGA-powered instances (aka F1 instances) to make the process simpler and quicker. Julien Simon walks you through building an FPGA-enabled application, from design to simulation to synthesis to execution on an F1 instance.

From zero to distributed traces: An OpenTracing tutorial

Bryan Liles (Heptio), Yuri Shkuro (Uber Technologies), Won Jun Jang (Uber), Prithvi Raj (Uber)

Download slides (PDF)

Yuri Shkuro, Bryan Liles, Won Jun Jang, and Prithvi Raj walk you through implementing distributed tracing in modern applications, using the CNCF’s OpenTracing project. You'll explore a set of sample applications and learn how to instrument them for tracing. You'll also use a tracing system such as Jaeger, Zipkin, or LightStep to visualize complex transactions that might span multiple processes.

FTFY: Research advances in automatic bug repair

Claire Le Goues (Carnegie Mellon University)

Watch the keynote

Claire Le Goues shares recent advances in academic software engineering and programming languages research that aims to bring that dream to reality, using everything from metaheuristic search to program synthesis to machine learning and search over big databases of existing code to make it happen.

Genji: A framework for building resilient near-real-time data pipelines

Swaminathan Sundaramurthy (Salesforce Inc), Mark Cho (Pinterest)

Download slides (PDF)

Pinterest has to support real-time decision making while operating on petabyte-scale data. Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest's real-time data pipeline (modeled on quasi-Kappa architecture), its impact on the company's systems, and tools and processes used and demonstrate how Pinterest models real-time ads analytics on the platform.

Government is a system.

Matt Cutts (United States Digital Service (USDS))

Watch the keynote

In government, you can still find out-of-date tech practices like writing requirements for years or launching systems without monitoring. The government wants more effective technology. Meanwhile, everyone else wants a more effective government. Matt Cutts discusses how better technology can improve not just software systems but also trust in government itself.

How do you eat a whale? One byte at a time

Kelly Looney (Skytap)

View slides

Kelly Looney shares an incremental approach to introducing containers into complex, distributed applications—resulting in modernization with less risk and more reward. You’ll learn how to evaluate which components of your applications are best suited for containers, how to experiment safely and get fast feedback, and how to increase and scale your container adoption.

How LinkedIn determines the capacity limits of its services using live traffic

Susie Xia (LinkedIn), anant Rao (LinkedIn)

Download slides (PDF)

Susie Xia and Anant Rao explain how LinkedIn leverages live production traffic to determine service and resource bottlenecks at scale with a tool called Redliner and how you can use your current architecture to do the same.

Instrumenting systems for arbitrary observability

Baron Schwartz (VividCortex)

Download slides (PDF)

Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? Baron Schwartz outlines the most useful things to know about observability in systems in production.

Kubernetes training

Sébastien Goasguen (TriggerMesh)

Download slides (PDF)

Kubernetes, one of the highest velocity projects on GitHub, is quickly becoming the leading platform on which to build distributed applications. Sebastien Goasguen offers a Kubernetes primer, covering the architecture of a Kubernetes installation, the API objects that make up a distributed application on Kubernetes, and more.

Managing server secrets at scale with a vaultless password manager

Ignat Korchagin (Cloudflare)

Download slides (PDF)

Ever wondered how to quickly and efficiently rollover all of your servers’ SSH keys or how to securely manage diskless systems? Ignat Korchagin outlines a simple approach that combines hardware support and a little cryptography to help operationalize the management of all the secrets in your cloud.

Microservices secrets management with Vault

Seth Vargo (Google)

Download slides (PDF)

It’s great that you’ve moved to microservices, but how are you distributing secrets? Seth Vargo offers an overview of Vault’s unique approach to secret management by providing secrets as a service for your services (and your humans too), which is highly scalable and easily customizable to fit any environment.

Monitoring in the time of cloud native

Cindy Sridharan (--)

Download slides (PDF)

As the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools prove to be grossly insufficient. Fortunately, the state of monitoring has evolved to meet these new demands, but it brings its own set of technical and organizational challenges. Cindy Sridharan offers an honest overview of monitoring challenges and trade-offs.

Running a massively parallel stream processing system at Netflix

Zhenzhong Xu (Netflix)

Download slides (PDF)

Keystone, a critical piece of Netflix's backend data infrastructure, ensures massive data movements and real-time event processing. Zhenzhong Xu leads a deep dive into Keystone's architecture and underlying stream processing engines, sharing insights and proven paths on how the company achieves multitenancy, scalability, and resilience in a complex cloud-native distributed system environment.

Struts 2, Equifax, and you: The story of the worst breach in history (sponsored by Contrast Security)

Arshan Dabirsiaghi (Contrast Security)

Download slides (DOCX)

Arshan Dabirsiaghi explains what Contrast Security learned from the Struts 2 exploit and details how to stop the next attack against your production apps.

Systems management with a voice UI using Amazon Alexa

Karthik Kirupanithi (Amazon Web Services)

Download slides (PDF)

Voice UIs like Amazon's Alexa can make systems management simple, intuitive, and delightful. The virtual private assistant feel of a VUI, coupled with the abstraction that voice commands bring, break the tedium of management tasks. Karthik Kirupanithi demonstrates how to put together an Alexa skill that can perform tasks using the EC2 Systems Manager.

The impact of design: How design influences outcomes

Cynthia Savard Saucier (Shopify)

Watch the keynote

We like to think that technology can make the world a better place, but we (conveniently) forget how it can make it worse. Primum non nocere (first do no harm) is the first concept taught in medical school, serving as a reminder of the possible harm that any intervention might do. Cynthia Savard Saucier challenges the tech industry to come up with its own fundamental principle.

The phone book is on fire: Lessons from the Dyn DNS DDoS attack

Lex Neva (Fastly)

View slides

When the DDoS attack crushed Dyn last October, did your DNS fail? Heroku's sure did. In response, Lex Neva deep dove into everything DNS to learn how to implement resilient DNS properly—reading RFCs, asking questions of pros, and performing real-world experiments when no one knew the answers. Join Lex to find out what does work and all the crazy details of DNS that he uncovered.

Thriving under a continuous self-inflicted DDoS attack

Kevin Beck (New Relic)

Download slides (1-PDF)

Download slides (2-PDF)

New Relic customers send monitoring data to New Relic servers every minute—a continuous firehose of data. Drawing on his experience at New Relic, Kevin Beck shares best practices for building a streaming service based on Apache Kafka, self-monitoring for reliability and fault tolerance, and building a DevOps culture that anticipates and prevents outages.

Unconventional programming paradigms for the future, now

Carin Meier (Cognitect)

Watch the keynote

As technology advances, our systems are growing more and more complex, reaching the threshold of what we can handle and even comprehend. We need more than tools to keep it under control. We need new ways of thinking. Carin Meier explores new ways to approach systems and tame complexity for the rapidly changing future.

You scream for microservices orchestration; I scream for batch; we all scream for jobs as code (sponsored by BMC Software)

Joe Goldberg (BMC Software)

Watch the keynote

Business transformation has led us to adopt new technologies and process and cultural changes. How batch application automation is built, tested, and run must evolve to keep pace. Joe Goldberg explores jobs as code, which looks at batch application automation from an SDLC perspective—an approach that embeds expectations within a modern automation platform.

Elite Sponsor

Google Cloud

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Innovators

Supporters

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email velocity@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Velocity contacts

©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com