Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA

ops conference sessions

11:50am–12:30pm Thursday, 05/28/2015
Charity Majors (Honeycomb)
How to hire and grow an amazing Ops team and build a great DevOps culture from the ground up. Covers interviewing techniques, cultural fit, first hires for a new team, and how to build a culture people love.
4:10pm–4:50pm Friday, 05/29/2015
Gavin Towey (Box, Inc)
Box developed two open source tools for MySQL: Anemometer and Raingauge. Anemometer tracks query performance, gathers query execution data from many databases at once, and has a simple API for DevOps integrations. Raingauge captures forensic data from MySQL when unexpected events occur so you can diagnose and fix production problems quickly. Learn how these free tools can help you.
11:50am–12:30pm Thursday, 05/28/2015
Eric Sammer (Rocana)
Running modern large-scale dynamic datacenters without equally modern monitoring is a recipe for disaster. In this session, we'll explore an architecture using open source infrastructure proven to handle tens of terabytes an hour of log, metric, and other event-oriented data, with real-time collection, processing, analytics, and alerting.
1:45pm–2:25pm Friday, 05/29/2015
Seth Vargo (HashiCorp)
Slides:   1-PDF 
Consul by HashiCorp is an open source tool for service discovery, monitoring, and infrastructure configuration. Simple configuration and powerful features like high availability, failure detection, and multi-datacenter awareness make Consul a great solution for organizations of all sizes trying to scale their monitoring.
2:40pm–3:20pm Thursday, 05/28/2015
Tim Prendergast (Evident.io)
The rise of programmatic infrastructure and services has created a rift in the industry between business acceleration and risk aversion/mitigation. The rate at which technology teams consume, manipulate, and iterate infrastructure now far exceeds traditional security technologies. A new approach to security, a DevOps approach, can marry these aspects of business together again in amazing ways.
5:05pm–5:45pm Thursday, 05/28/2015
Laine Campbell (OpsArtisan)
Slides:   1-PDF 
IT is undergoing a revolution, and database administration is no exception. As Ops teams evolve into reliability engineers, developers and traditional systems administrators find themselves diving into the world of the DBA. In this session, we take the model of site reliability engineering and guide you through the components of the craft of DB architecture/operations in that context.
5:05pm–5:45pm Friday, 05/29/2015
Dave Cliffe (PagerDuty), Arup Chakrabarti (PagerDuty)
We will present a framework for understanding operational incident response, along with practical ways to understand impact, manage responsiveness (and burnout), coordinate incident command, and leverage the right level of collaboration. We hope that this will make a meaningful impact on how you manage incidents, from the duckiest to the fowl-est of them.
11:50am–12:30pm Friday, 05/29/2015
Steve Hoffman (Orbitz Worldwide), Rick Fast (Expedia Inc.)
Slides:   1-ZIP 
In this talk we will discuss how we enabled decomposition of one of our 250+ system components into a continously deployed microservice cluster using Docker, Consul, the ELK stack, and Graphite. We will discuss the architecture and supporting services as well as the continous delivery from source to production via Ansible and Jenkins.
2:40pm–3:20pm Thursday, 05/28/2015
Ian Malpass (Etsy)
Slides:   1-PDF 
Failure is inevitable. Wait! Come back! It's OK. If things are going to go wrong, what do you do? I'll tell you about Etsy's approach to failure: how it influences our tools and our philosophy, and how we try to minimise the cost of failure.
4:30pm–4:50pm Wednesday, 05/27/2015
Jeff Sussna (Ingineering.IT)
Slides:   1-PDF 
The complex, co-creative nature of digital services means we can’t fully know how our designs will work until we deploy them. Operations needs to be an input to design as well as an output. This talk will present continuous design as an extension of DevOps, and describe concrete ways to create a circular design-operations loop that unifies marketing, design, development, operations, and support.
2:35pm–2:55pm Wednesday, 05/27/2015
Benny Wong (Timehop)
Slides:   1-PDF 
You've read everything on Hacker News. You've deployed a bunch of apps on Heroku. You've signed up for as many stats and APM services as you know of. You have everything you need to scale if your startup blows up, right? In this talk, we'll walk through some of the lessons we've learned the hard way growing 13x over the past 9 months.
1:45pm–2:25pm Friday, 05/29/2015
Adam Auerbach (Capital One)
In a highly integrated environment with agile teams having many dependencies, Capital One had the typical rigorous release and change management process you would expect at a large national bank. Through the adoption of DevOps and other “shift left” enablers, Capital One was able to automate these processes and quality gates, to enable release-on-demand and eventually continuous delivery.
2:40pm–3:20pm Thursday, 05/28/2015
David Genn (IG Group)
How do you migrate to a continuous delivery model in the highly regulated world of financial trading? In this talk, we look at how IG Group have started this process to allow us to break away from monthly releases, and get features into production faster whilst protecting uptime and stability.
11:50am–12:30pm Friday, 05/29/2015
Kelsey Hightower (Google)
The last decade belonged to virtual machines; the next one belongs to containers. It is time to look at new ways to deploy and manage applications at scale. CoreOS is a new Linux distribution designed specifically for application containers and running them at scale. This talk will examine all the major components of CoreOS (etcd, fleet, docker, systemd) and how these components work together.
11:50am–12:30pm Thursday, 05/28/2015
Jen Andre (Komand)
Are you using Docker today or looking to dip your toes in? Maybe you’ve heard some debate about whether or not Docker is ‘secure’ enough for production deployments. What does this mean? Jen will give you an overview of the Docker security model, a dive into the potential risks, and the tools that are available within the Docker ecosystem to help run Docker containers securely.
4:10pm–4:50pm Thursday, 05/28/2015
Mandi Walls (Chef)
Slides:   1-ZIP 
This follow up to my 2012 Velocity talk, “Challenges to Cultural Change,” will examine a number of common themes presenting in technology organizations of varying size. These pathologies exhibit strengths and weaknesses around tasks, behaviors, and treatment of personnel that affect the day-to-day running, and long-term success, of IT projects.
5:05pm–5:45pm Thursday, 05/28/2015
Mike Arpaia (Facebook)
There's a common misconception in information security that trade secrets, institutional knowledge, and internal software all need to stay secret in order to maintain a strong level of security and safety from malicious hackers. In this session, we'll discuss osquery, a popular Facebook open source project, which supports organizations taking their security into their own hands.
9:45am–10:05am Wednesday, 05/27/2015
J. Paul Reed (Release Engineering Approaches)
The DevOps explosion has viscerally illustrated the need to move toward leaner, more nimble, real-time business operations. But change doesn't happen overnight and will fail without support from the entire organization.
4:10pm–4:50pm Thursday, 05/28/2015
Aaron Peters (TurboBytes), Kyle Young (Mobify)
Slides:   1-PDF 
CDNs fail regularly and in many different ways, varying from a small, local increase in latency to global unavailability. In this data-filled talk, we’ll present convincing evidence for why you need to prep for CDN failure, and practical guidance for monitoring CDN performance and implementing a multi-CDN strategy.
1:45pm–2:25pm Thursday, 05/28/2015
Arun Kejariwal (Machine Zone), Sailesh Mittal (Twitter), Karthik Ramasamy (Twitter)
Data-driven decision making rests, in part, on availability of high fidelity data. Presence of anomalies limits the use of data on an “as is” basis. Automatic anomaly detection is key to providing high fidelity data. We present a statistically rigorous method for automatic anomaly detection, which leverages correlations between multiple time series.
2:40pm–3:20pm Friday, 05/29/2015
Baron Schwartz (VividCortex)
Slides:   1-PDF 
This is a story of how systems taught me about people, and people taught me about systems. Once you see it, you can't unsee it: systems and teams both do work, and have the same types of bottlenecks. The real breakthrough is figuring out what to do about it, and that's where systems and people are very different. I'll share in detail.
2:40pm–3:20pm Friday, 05/29/2015
Bill Green (New Relic)
The network forms a critical part of the user experience, whether you run the most popular site on the internet or a single database on an office LAN. How much visibility into this critical part of your infrastructure do you really have? Polling, traps, and logs are good, but today they aren’t enough. Learn how network flow data can bring your own network mysteries into sharp focus.
1:45pm–2:25pm Thursday, 05/28/2015
Mark Maun (Ticketmaster)
Slides:   external link
Mark will talk about his journey at Ticketmaster, where he rolled out tools and automation, and not only made the Ticketmaster software factory more efficient but also invoked cultural change in the workplace. Rather than mandate cultural change by executive fiat, we did it organically from the ground up. Tools and technology, not executive memos, were the origin of this transformation.
5:05pm–5:45pm Thursday, 05/28/2015
Robert Treat (OmniTI)
If you think there is a gap between Dev and Ops, Design must seem like it comes from another planet. But many of the core ideas behind design work are directly applicable to the world of Ops, especially given that running successful websites is no longer (just) about keeping servers up, but about designing human and technical systems that help enable people to achieve continuous operations.
4:10pm–4:50pm Thursday, 05/28/2015
Tim Sheiner (Jut)
Slides:   1-PDF 
Get a UX professional's view of the operations world... it's like a look in the mirror, except you'll walk away with tangible ways to rethink how you communicate about your challenges, your work, and the opportunities for Ops within your business.
5:05pm–5:45pm Thursday, 05/28/2015
Aneel Lakhani (SignalFx)
Slides:   1-PDF 
Because we want to win. OODA - Observe, Orient, Decide, Act - is pop-tech-devOps-unicorn-buzzword-cargo-culting canon. But the idea is more interesting than the myth. It's not about moving fast faster fastest. It's about changing the game being played. It’s about shrinking the time to do some things so you can spend more time doing other things. It’s about the time we have and where we spend it.