San Jose • New York • London

Build Systems that Drive Business

Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference

New York, NY

Speaker slides & video

Presentation slides will be made available after the session has concluded and the speaker has given us the files. Check back if you don't see the file you're looking for—it might be available later! (However, please note some speakers choose not to share their presentations.)

If you are looking for slides and video from 2017, visit the Velocity New York 2017 site.

"Not invented here" syndrome and dark debt: The PagerDuty story

Aish Raj Dahal (PagerDuty)

Download slides (1-FILE)

Download slides (2-PDF)

Finding the right balance between writing custom in-house software and using an off-the-shelf solution is difficult. Aish Raj Dahal sheds light on the age old build versus buy problem and "not invented here syndrome" by explaining how PagerDuty built a distributed task scheduler and later moved off it to use an off-the-shelf open source solution.

60,000 tests in six minutes: Create a reliable pipeline, eliminate flaky tests, and deploy safely but quickly

Sam Guckenheimer (Microsoft)

Download slides (PDF)

Good test coverage is essential for catching issues before a pull request has been merged, but they have to be the right kind of tests and must be reliable. Drawing on his experience at Microsoft, Sam Guckenheimer details what type of tests to do in your DevOps pipeline, when you should do them, and why.

A programmer's guide to secure connections

Liz Rice (Aqua Security)

Download slides (1-PDF)

View slides

Beyond looking out for a little green padlock in the browser bar, what do you need to know about secure connections as a programmer? What do people mean by terms like authentication, verifying a certificate, or signing a message? Join Liz Rice as she demystifies HTTPS, TLS, X.509, and more.

Ansible for SRE teams

James Meickle (Quantopian)

Download slides (PDF)

Ansible is a "batteries included" automation, configuration management, and orchestration tool that's fast to learn and flexible enough for any architecture. Join James Meickle to get started with Ansible, with an eye toward sustainable development in cloud environments.

Attack trees: Security modeling for Agile teams

Michael Brunton-Spall (Bruntonspall Ltd)

Download slides (PPTX)

Traditional security approaches to threat and risk management are highly optimized to work within a traditional software development lifecycle. Michael Brunton-Spall shares a new approach to reviewing systems along with real-life examples to help you prioritize where to focus security efforts and what sorts of security threats you should worry about.

Availability, latency, and cost: Withstanding regional outages

Aaron Blohowiak (Netflix)

Download slides (PDF)

Multiregion deployments can improve availability and latency and can cost way less than you think. Aaron Blohowiak dives into his experience operating in multiple regions at scale at Netflix and shares the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach.

Beyond accidental architecture

James Thompson (Mavenlink)

Download slides (ZIP)

Accidental architecture is a product of circumstances rather than deliberate development toward a goal. James Thompson explains why it's best addressed by equipping teams to make more deliberate and informed technical decisions.

Building successful site reliability engineering in large enterprises

Liz Fong-Jones (Honeycomb), Dave Rensin (Google)

Watch the keynote

Download slides (PDF)

Implementing site reliability (SRE) engineering doesn't have to be intimidating, and it isn't only for cloud-native organizations. Liz Fong-Jones and Dave Rensin share eight key lessons Google's customer reliability engineering team learned helping large enterprises adopt SRE as an operations engineering model.

Bulk image processing using Kubernetes

Mike Newswanger (Elastic)

Download slides (PDF)

Mike Newswanger explains how he used Kubernetes and Google Cloud to burst and extend the capacity of a physical infrastructure for optimizing almost 10 million images in less than two weeks.

Communicating and managing change

Rocio Delgado (Slack)

Download slides (PDF)

Evolving teams and evolving companies are a constant in the career of a leader; helping your team navigate through that change becomes critical to your success as a manager and for the organization. Rocio Delgado shares dos and don'ts for managing and communicating change in your team or organization, which may highlight where your own skills need to evolve.

Consuming cloud services with the Kubernetes Service Catalog

Neil Peterson (Microsoft)

Download slides (PPTX)

Neil Peterson leads a technical deep dive into using the Kubernetes Service Catalog to dynamically provision and consume managed cloud services.

Continuous Disintegration

Anil Dash (Fog Creek Software)

Watch the keynote

As our industry faces its biggest reckoning ever with the social, ethical and cultural impacts of technology, what can we learn if we reflect on the assumptions we build into our systems? How could our processes and tools be designed to undo the biggest bugs and biases of today’s tech?

Creating an Effective Developer Experience on Kubernetes

Daniel Bryant (Datawire)

Download slides (PDF)

Join this talk to learn about how to curate your perfect developer experience using Kubernetes.

Disaster resilience the Waffle House way, from flattops to feature flags and more

Heidi Waterhouse (LaunchDarkly)

View slides

Waffle House's hurricane disaster plan has everything you could want from an IT disaster plan, including contact trees, failover states, and runbooks on partial operation. Heidi Waterhouse shares lessons about state drawn from the world outside computers and explains how to quantify them using a finite state machine and implement them automatically while you are in a less-than-perfect condition.

Faster is safer: Security in the enterprise

Molly Crowther (Pivotal)

Download slides (ZIP)

Molly Crowther demonstrates how the enterprise can use cloud platforms to make security move at the pace of business—not the other way around.

Frankenstein's microservices: How to avoid the monster

Michael Hamrah (Namely)

Download slides (PPTX)

Many companies adopt microservices to break down monoliths, but they soon uncover a hidden cost: How do you manage all these new interconnected things popping up? Michael Hamrah explains how to avoid creating Frankenstein's monster by understanding elements of a microservice platform. . .so you can sleep at night.

From silos to a single pane of glass at USA TODAY NETWORK

Bridget Lane (Gannett | USA Today), Kris Vincent (Gannett | USA Today)

Download slides (ZIP)

Three years ago, technical teams at USA TODAY NETWORK were completely siloed, making improvements and troubleshooting difficult and often blind to the rest of the technical organization. Bridget Lane and Kris Vincent explain how drastically the teams' tool belts, thought processes, and goals have changed as the company moved from silos to a single pane of glass.

How do we solve the world's spreadsheet problem?

Alexander Rasmussen (Freenome)

Download slides (PDF)

In the past five years, Alexander Rasmussen has spent a lot of time trying to get high-integrity data out of spreadsheets and into databases. Alexander explores common data integrity problems when dealing with spreadsheet data, investigates whether those integrity problems are inescapable, and shares ongoing work to mitigate them.

How NTSB air disaster analysis can help you in an emergency

Matt Rogish (ReactiveOps)

View slides

Matt Rogish explains how NTSB investigations of air disasters have dramatically improved flight safety and applies lessons learned in disaster recovery and analysis, teamwork, task saturation, and systems design to modern software application and infrastructure architecture at scale to achieve higher availability, reduced errors, and more scalable systems.

How to break up with your vendor

Amy Nguyen (Stripe), Cory Watson (Stripe)

View slides

You're unsatisfied with one of your monitoring providers. You've considered finding a new solution, but the thought of migrating your data off their platform sounds extremely painful. Amy Nguyen and Cory Watson explain how to make a deadline for an infrastructure-critical software migration while ensuring that everyone's requirements are met and no data has been lost.

How to get away with refactoring

Maude Lemaire (Slack Technologies, Inc.)

Download slides (PDF)

How do you refactor major, core functionality in a million-line codebase without disrupting the entire system? Maude Lemaire explains how Slack overhauled channels and shares the many obstacles the company overcame to boost both application performance and company-wide developer productivity (with only a few hiccups).

Integrating developer and operator experience in Kubernetes

Brendan Burns (Microsoft)

Download slides (PDF)

Developer and operator personas are often viewed as separate, but the truth on the ground is actually far more mixed. Developers often operate their own software, and operators often explore software to find and fix bugs. Brendan Burns covers this overlap, explaining how to build tooling and approaches that enable developers and operators to quickly switch or blend between the personas.

Knative: Kubernetes, serverless, and you

Ryan Gregg (Google)

Download slides (PDF)

It's a Kubernetes world. Join Ryan Gregg to learn about Knative, an open source collaboration between Google and other industry leaders to define the future of serverless on Kubernetes. Knative solves the difficult but boring aspects of running modern cloud applications on Kubernetes.

Kubernetes bootcamp: Deploying and scaling microservices

Jerome Petazzoni (Tiny Shell Script LLC)

Download slides (PDF)

Kubernetes has a reputation for being complex to set up and operate, but that doesn't have to be the case. Join Jérôme Petazzoni to explore Kubernetes concepts and architecture and learn how to use it to deploy and scale your applications. The content is suitable to all kinds of deployment models, from the cloud (AKS, EKS, GKE, kops, etc.) to on-premises.

Kubernetes: Crossing the chasm

Ian Crosby (Container Solutions)

Download slides (PDF)

As Kubernetes enters the mainstream market, we are seeing more use cases that don't fit the original mold, each bringing a new set of challenges. Ian Crosby discusses three specific case studies, the challenges encountered adopting Kubernetes, and the solutions and tooling used to solve them.

Lessons learned migrating HealthCare.gov to Terraform

Christian Monaghan (HealthCare.gov | Nava PBC)

Download slides (PDF)

Christian Monaghan explains how he and his team successfully migrated HealthCare.gov, America's largest government website, to the cloud infrastructure provisioning tool Terraform, shares lessons learned along the way, and details how you can effectively use Terraform for your next project.

Managing multiple sources of truth in distributed applications

Adam Wolfe Gordon (DigitalOcean)

Download slides (PDF)

When building distributed applications, it's highly desirable to maintain a single source of truth, such as a database, for all application state. Unfortunately, for some applications, multiple sources of truth are unavoidable. Adam Wolfe Gordon shares strategies, learned from real-world experience, for managing multiple sources of truth without sacrificing consistency and usability.

Migrating Spotify's runtime to Kubernetes

James Wen (Spotify)

View slides

Spotify recently completed the migration of all services from running on bare-metal hardware to hosts in the cloud on GCP. Spotify is now in the exciting process of journeying from merely cloud hosted to cloud native via migrating the running of services to Kubernetes. James Wen discusses the work involved, lessons learned, and pitfalls encountered in moving services onto Kubernetes.

ML on code: Machine learning will change programming

Francesc Campoy (Dgraph)

Watch the keynote

Download slides (PDF)

Machine learning has revolutionized many fields, from cancer detection to self-driving cars. And let's not forget about connected toilets that allow Alexa to flush at your command. Francesc Campoy Flores explores some of the techniques used and the most relevant research, focusing on use cases where machine learning can help developers be more efficient.

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely

Sean Allen (Wallaroo Labs)

Download slides (PDF)

In 2007, Pat Helland published "Life Beyond Distributed Transactions: An Apostate’s Opinion," in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications.

Performance anomaly detection at scale (sponsored by Salesforce)

Tuli Nivas (Salesforce)

Download slides (PPTX)

Automated anomaly detection in production using simple data science techniques enables you to more quickly identify an issue and reduce the time it takes to get customers out of an outage. Tuli Nivas shows how to apply simple statistics to change how performance data is viewed and how to easily and effectively identify issues in production.

Practical performance theory

Kavya Joshi (Samsara)

Watch the keynote

View slides

Performance theory offers a rigorous and practical approach to performance tuning and capacity planning. Kavya Joshi dives into elegant results like Little’s law and the Universal Scalability Law. You'll also discover how performance theory is used in real systems at companies like Facebook and learn how to leverage it to prepare your systems for flux and scale.

Rebuilding the airplane in flight. . .safely

Shannon Weyrick (NS1), James Royalty (NS1)

Download slides (PDF)

Rewriting the key software component of your platform from scratch is always intimidating. Shannon Weyrick and James Royalty discuss NS1's recent DNS server rewrite and outline the steps the company took to roll it out across its globally distributed network with no downtime.

Revisiting HTTP/2

Hooman Beheshti (Fastly)

View slides

Now that adoption is ramped up and HTTP/2 is being regularly used on the internet, it's a good time to revisit the protocol and its deployment. Hooman Beheshti reviews protocol basics and digs into core features such as interaction with TCP, server push, priorities and dependencies, and HPACK.

Sell cron, buy Airflow: Modern data pipelines in finance

James Meickle (Quantopian)

Download slides (PDF)

Quantopian integrates financial data from vendors around the globe. As the scope of its operations outgrew cron, the company turned to Apache Airflow, a distributed scheduler and task executor. James Meickle explains how in less than six months, Quantopian was able to rearchitect brittle crontabs into resilient, recoverable pipelines defined in code to which anyone could contribute.

Serverless APIs with AWS Lambda and API Gateway

Bill Boulden (ClearView Social)

View slides

Serverless architectures remove load from web servers and scale flawlessly to handle any volume while keeping you from paying for an instant of wasted idle time. Bill Boulden walks you through creating a functioning serverless API that coexists alongside conventionally served web pages using AWS Lambda and API Gateway.

SLO burn

Jamie Wilkinson (Google)

Download slides (PDF)

Jamie Wilkinson offers a brief overview of SLOs, shares a practical guide to implementing sustainable SLO-based alerting for systems of any size, and outlines the tooling required to supplement the system in the absence of cause-based alerting.

Small-scale engineering

effie mouzeli (Wikimedia Foundation)

Download slides (PDF)

Effie Mouzeli explains why small-scale engineering is just as challenging as large-scale engineering and offers ideas on how to survive technical debt, poor communication, and other everyday challenges.

Smart networking with service meshes

Anubhav Mishra (HashiCorp)

Download slides (PDF)

Over the past year, service meshes have gained significant interest. Most service meshes have two components: a control plane and a data plane. Anubhav Mishra explains what it takes to build a scalable control and data plane. Anubhav also discusses how HashiCorp Consul provides many features like a distributed key-value store and service discovery that make it ideal for a control plane.

Smooth scaling: Slack’s journey toward a new database

Ameet Kotian (Slack)

Download slides (ZIP)

Slack’s rapid growth over the last few years outpaced the original database’s scaling capacity, which negatively impacted the company's customers and engineers. Ameet Kotian explains how a small team of engineers embarked on a journey for the right database solution, which eventually led them to Vitess, an open source cluster database.

Strategies for better technical interviews

Moishe Lettvin (MailChimp)

Download slides (PPTX)

Technical interviewing is profoundly important, but unfortunately, it's easy to do poorly and very difficult to do well. Moishe Lettvin outlines strategies for reducing bias and increasing the fidelity of your technical interviews.

Switching horses midstream: The challenges of migrating 150+ microservices to Kubernetes

Sarah Wells (Financial Times)

Download slides (PPTX)

The Financial Times recently migrated its content platform to Kubernetes. Join Sarah Wells to find out what it takes to migrate 150+ microservices from one container stack to another without affecting the existing production users and while the rest of your teams are working on delivering new functionality.

Test, measure, iterate: Balancing “good enough” and “perfect” in the critical path (sponsored by NS1)

Kris Beevers (NS1)

Watch the keynote

In critical path services such as DNS, stability is imperative above all else. Kris Beevers examines the trade-offs between risk and velocity faced by any high-growth, critical path technology business.

The simply complex task of implementing Kubernetes ingress: Lessons learned

Richard Li (Datawire)

Download slides (PDF)

Getting traffic into a Kubernetes cluster should be simple, but it’s not. The range of options can be confusing, and implementing effective configuration is equally challenging. Richard Li discusses the evolution of ingress on Kubernetes, explains why ingress controllers aren’t necessarily the best approach, and shares a series of lessons learned about managing traffic ingress.

Tracing polyglot systems: An OpenTracing tutorial

Yuri Shkuro (Uber Technologies), Prithvi Raj (Uber), Won Jun Jang (Uber)

Download slides (PDF)

Priyanka Sharma and Yuri Shkuro demonstrate how distributed tracing works and how to employ it in the development and operations of your applications in the programming language of your choice: Java, Go, Python, Node.js, C#, or C++.

Troubleshooting Kubernetes applications

Michael Hausenblas (AWS)

View slides

Michael Hausenblas walks you through troubleshooting applications running in Kubernetes, from application-level debugging to distributed tracing to chaos engineering.

Using distributed trace data to solve performance and operational challenges

Naoman Abbas (Pinterest)

Download slides (ZIP)

Naoman Abbas offers an overview of tools Pinterest built to process trace data and the use cases they’ve enabled and shares some real-world examples. Join in to learn how to apply these techniques to your own challenges.

Who guards the guardians? Designing for resilience in cluster orchestrators

Preetha Appan (HashiCorp)

Download slides (PDF)

Preetha Appan outlines various failure modes ranging from network failures to entire server failures in Nomad, an open source scheduler that supports heterogeneous workloads.

You've been arrested by the CAP; you have the right to remain consistent.

Aviran Mordo (Wix.com)

View slides

Aviran Mordo discusses the challenges and real-life use cases of handling data in a distributed environment.

Zero to Kubernetes in five minutes (sponsored by Mesosphere)

Dan Mennell (Mesosphere)

Download slides (1-PPT)

Download slides (2-ZIP)

Getting Kubernetes up and running is only half the battle. Now you need to get the supporting infrastructure in place. Dan Mennell shares a templated approach to deploying what is needed to get started with source control, CI/CD, and monitoring with Prometheus, along with other things.

Diamond Sponsor

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Innovators

Supporters

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email velocity@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Velocity contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com