Build Systems that Drive Business
Sep 30–Oct 1, 2018: Training
Oct 1–3, 2018: Tutorials & Conference
New York, NY

Speakers

Hear from a wide range of talented senior engineers, systems practitioners, and technical managers who are doing amazing things in distributed systems and DevOps. More speakers will be announced; please check back for updates.

Filter

Search Speakers

Naoman Abbas is an engineering manager for the visibility team at Pinterest, which is responsible for building and maintaining monitoring tools like the company’s metrics system, logsearch, and distributed tracing. Previously, Naoman was a software engineer building cloud platform components at Netflix and Microsoft.

Presentations

Using distributed trace data to solve performance and operational challenges Session

Naoman Abbas offers an overview of tools Pinterest built to process trace data and the use cases they’ve enabled and shares some real-world examples. Join in to learn how to apply these techniques to your own challenges.

Sean T. Allen is vice president of engineering at Wallaroo Labs and a member of the Pony core team. His turn-ons include programming languages, distributed computing, Hiwatt amplifiers, and Fender Telecasters. His turn-offs include mayonnaise, stirring yogurt, and sloppy code. He’s one of the authors of Storm Applied.

Presentations

Pat Helland and me: How to build stateful distributed applications that can scale almost infinitely Session

In 2007, Pat Helland published "Life Beyond Distributed Transactions: An Apostate’s Opinion," in which he conducts a thought experiment on how to design a distributed database that can scale almost infinitely. While the paper explicitly addresses distributed database design, Sean Allen shows that the ideas are far more widely applicable, particularly in scaling stateful applications.

Preetha Appan is a software engineer on the Nomad team at HashiCorp, most recently working on scheduler internals. Previously, she worked on various Consul features toward Consul 1.0 at HashiCorp and was an early engineer at Indeed.com, where she built distributed systems for search and recommendations from the ground up.

Presentations

Who guards the guardians? Designing for resilience in cluster orchestrators Session

Preetha Appan outlines various failure modes ranging from network failures to entire server failures in Nomad, an open source scheduler that supports heterogeneous workloads.

Matt Baldwin is the founder and CEO of StackPointCloud, a company focused on Kubernetes, Istio, and cloud-native technology. Matt is one of the creators of Stackpoint.io, an automation platform for cloud-native workloads, allowing users to span across multiple cloud providers. He also started and runs the largest Kubernetes community in the world, spanning Seattle, San Francisco, New York City, Los Angeles, Chicago, and Berlin.

Presentations

Multicloud, multiregion cross-cluster communication with Istio Session

Matt Baldwin discusses multicloud, multiregion cross-cluster communication with Istio.

Kris Beevers is CEO at NS1, where he leads the company’s team of industry experts as they create products to enable companies to use DNS to build and deliver dynamic, distributed, and automated applications that delight users. Previously, he built CDN, cloud, bare-metal, and other infrastructure products at Voxel (acquired by Internap in 2011). Kris is a recognized authority on DNS and global application delivery and often speaks and writes about building and deploying high-performance, at-scale, globally distributed internet infrastructure. He holds a PhD in computer science from RPI.

Presentations

Test, measure, iterate: Balancing “good enough” and “perfect” in the critical path (sponsored by NS1) Keynote

In critical path services such as DNS, stability is imperative above all else. Kris Beevers examines the trade-offs between risk and velocity faced by any high-growth, critical path technology business.

Hooman Beheshti is vice president of technology at Fastly, where he develops web performance services for the world’s smartest CDN platform. A pioneer in the application acceleration space, Hooman helped design one of the original load balancers while at Radware and has held senior technology positions with Strangeloop Networks and Crescendo Networks. He has worked on the core technologies that make the internet work faster for nearly 20 years and is an expert and frequent speaker on the subjects of load balancing, application performance, and content delivery networks.

Presentations

Revisiting HTTP/2 Session

Now that adoption is ramped up and HTTP/2 is being regularly used on the internet, it's a good time to revisit the protocol and its deployment. Hooman Beheshti reviews protocol basics and digs into core features such as interaction with TCP, server push, priorities and dependencies, and HPACK.

Kristina Bennett is a software engineer on the customer reliability engineering team at Google, where she helps support the team’s mission to SRE everyone else.” Previously, she spent five years working on data integrity across Google.

Presentations

Trade-offs in resiliency: Managing the burden of data recoverability Session

Kristina Bennett shares best practices for practical data recoverability and shines a light onto some of the pitfalls awaiting the unwary, based on lessons learned from five years of data integrity tooling and consulting across Google.

Michael Bernstein is cofounder at Reify, a marketing and sales consultancy specializing in B2B software. Previously, Michael was vice president of community at Code Climate, where he helped the company grow through seed and A-round funding, from 1 to nearly 20 employees, and from low hundreds of thousands to millions in ARR.

Presentations

Why marketing matters Keynote

For many open source developers, marketing can seem like a scam—pushing terrible software from one side of the mouth while ruining good software with the other. Michael Bernstein offers an unflinching look at some of the fallacies that developers believe about marketing.

Ria Bhatia is a program manager for Azure on the cloud-native team compute at Microsoft. She is a maintainer of the new open source project Virtual Kubelet and works on Azure Container Instances. Ria holds a BS in computer science from Penn State. She lives in Seattle and loves anything dog related.

Presentations

Scaling in Kubernetes, matched to music Session

Ria Bhatia explains what it takes to build up the cluster autoscaler and horizontal pod autoscaler from the operations perspective, incorporating experiments and tests that were run to come up with solutions for appropriately tweaking metrics so scaling is cost effective and efficient.

Aaron Blohowiak is a senior software engineer on the traffic team at Netflix, where he is applying his passion for empiricism and system design to multiregion high-availability architecture and operations. Aaron has been building, breaking, and fixing systems for over a decade from tiny startups to serving over 100M users at Netflix. He is the coauthor of Chaos Engineering.

Presentations

Availability, latency, and cost: Withstanding regional outages Session

Multiregion deployments can improve availability and latency and can cost way less than you think. Aaron Blohowiak dives into his experience operating in multiple regions at scale at Netflix and shares the algebraic models, code, and incident management playbooks the company has developed to tame, refine, and leverage its approach.

Bill Boulden is the chief technology officer of ClearView Social, where he has migrated a VM-driven infrastructure to an autoscaling application fleet with serverless components. Bill has been developing software since the age of six. Previously, he was an API architect at Delaware North Companies. Running serverless applications in production has given him a unique perspective on architecture and application delivery for modern companies. In his spare time, he’s a pink-haired house music DJ by the name of Spruke, who enjoys EDM and generative ambient music.

Presentations

Serverless APIs with AWS Lambda and API Gateway Tutorial

Serverless architectures remove load from web servers and scale flawlessly to handle any volume while keeping you from paying for an instant of wasted idle time. Bill Boulden walks you through creating a functioning serverless API that coexists alongside conventionally served web pages using AWS Lambda and API Gateway.

Michael Brunton-Spall is an independent security consultant. Previously, Michael was deputy director for technology and operations and head of cybersecurity at the UK Government Digital Service and held a number of jobs ranging from creating low-level embedded hardware to gaming development on consoles to scaling and operating the Guardian newspaper. He is a regular conference speaker, the author of Agile Application Security, and an enthusiastic Agilist and security geek.

Presentations

Attack trees: Security modeling for Agile teams Tutorial

Traditional security approaches to threat and risk management are highly optimized to work within a traditional software development lifecycle. Michael Brunton-Spall shares a new approach to reviewing systems along with real-life examples to help you prioritize where to focus security efforts and what sorts of security threats you should worry about.

Daniel Bryant is an independent technical consultant and product architect at Datawire, where he specializes in enabling continuous delivery within organizations through the identification of value streams, the creation of build pipelines, and the implementation of effective testing strategies. Daniel’s technical expertise focuses on DevOps tooling, cloud and container platforms, and microservice implementations. He contributes to several open source projects, writes for InfoQ, O’Reilly, and Voxxed, and regularly presents at international conferences, including OSCON, QCon, and JavaOne.

Presentations

Creating an Effective Developer Experience on Kubernetes Session

Join this talk to learn about how to curate your perfect developer experience using Kubernetes.

Brendan Burns is a distinguished engineer at Microsoft Azure, where he runs the container service and resource manager teams, and a cofounder of the Kubernetes open source project. Previously, he worked at Google on cloud APIs and web search infrastructure and was a professor of computer science at Union College. Brendan holds a PhD in computer science from the University of Massachusetts Amherst and a BA in computer science and studio art from Williams College.

Presentations

Integrating developer and operator experience in Kubernetes Cloud Computing with Kubernetes

Developer and operator personas are often viewed as separate, but the truth on the ground is actually far more mixed. Developers often operate their own software, and operators often explore software to find and fix bugs. Brendan Burns covers this overlap, explaining how to build tooling and approaches that enable developers and operators to quickly switch or blend between the personas.

Panel discussion: The future of Kubernetes—Challenges and opportunities Cloud Computing with Kubernetes

Join this panel on the future of Kubernetes, as Sarah Wells, Brendan Burns, Kris Nova, and Alice Goldfuss explore upcoming challenges and opportunities.

Tammy Butow is a principal SRE at Gremlin, where she works on chaos engineering—the facilitation of controlled experiments to identify systemic weaknesses. Gremlin helps engineers build resilient systems using their control plane and API. Previously, Tammy led SRE teams at Dropbox responsible for the databases and storage systems used by over 500 million customers and was an IMOC (incident manager on call), where she was responsible for managing and resolving high-severity incidents across the company. She has also worked in infrastructure engineering, security engineering, and product engineering. Tammy is the cofounder of Girl Geek Academy, a global movement to teach one million women technical skills by 2025. Tammy is an Australian and enjoys riding bikes, skateboarding, snowboarding, and surfing. She also loves mosh pits, crowd surfing, metal, and hardcore punk.

Presentations

Chaos Day: When reliability reigns Keynote

Tammy Butow explains how your company can use Chaos Days to focus on controlled chaos engineering. Similar to Hack Days, Chaos Days encourage an open culture of engineering. However, instead of focusing on building features, Chaos Days help you focus on building more resilient systems and reducing incidents.

Chaos engineering bootcamp Tutorial

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow, Ana Medina, and Patrick Higgins lead a hands-on deep dive into chaos engineering, covering the tools and practices you need to implement it in your organization.

Francesc Campoy is the VP of Product at Dgraph: the most advanced distributed graph database.

Before that, he was VP of Product and Developer Relations at source{d}, the company enabling Machine Learning for large scale code analysis and building the platform for the future of developer tooling. Previously, he worked at Google as Senior Developer Advocate for Google Cloud Platform and the Go team.

He’s passionate about programming and programmers, especially Go and gophers. As part of his effort to help those learning, he’s given many talks and workshops at conferences like Google I/O, Gophercon(s), GOTO, or OSCON.

When he’s not on stage he’s probably coding, writing blog posts, or working on his justforfunc YouTube series where he hacks while cracking bad jokes.

Presentations

ML on code: Machine learning will change programming Keynote

Machine learning has revolutionized many fields, from cancer detection to self-driving cars. And let's not forget about connected toilets that allow Alexa to flush at your command. Francesc Campoy Flores explores some of the techniques used and the most relevant research, focusing on use cases where machine learning can help developers be more efficient.

Ian Coldwater is a DevSecOps engineer turned red teamer who specializes in breaking and hardening Kubernetes, containers, and cloud native infrastructure. In their spare time, they like to go on cross-country road trips, capture flags, and eat a lot of pie. Ian lives in Minneapolis and tweets as @IanColdwater.

Presentations

Ship of fools: Shoring up Kubernetes security Session

Ian Coldwater offers practical advice about securing your Kubernetes clusters, from an attacker’s perspective.

Ian Crosby is the managing director of Container Solutions, Montreal, where he works with organizations across many domains as they make the move to cloud native. His work spans application design, build pipelines, and cloud infrastructure, with a heavy focus on containers, orchestration, and other cloud native tools. A longtime software developer and advocate, Ian has worked across a wide range of industries from military defense systems to SaaS platforms. He’s passionate about improving how we build and run software.

Presentations

Kubernetes: Crossing the chasm Cloud Computing with Kubernetes

As Kubernetes enters the mainstream market, we are seeing more use cases that don't fit the original mold, each bringing a new set of challenges. Ian Crosby discusses three specific case studies, the challenges encountered adopting Kubernetes, and the solutions and tooling used to solve them.

Molly Crowther is a senior technical program manager at Pivotal, working on security strategy and vulnerability management for Cloud Foundry. She also leads security workshops for open and closed source Cloud Foundry teams in North America and Europe.

Presentations

Faster is safer: Security in the enterprise Session

Molly Crowther demonstrates how the enterprise can use cloud platforms to make security move at the pace of business—not the other way around.

Aish Raj Dahal is a San Francisco-based engineer at PagerDuty, where he is building PagerDuty’s event intelligence platform—and often deals with fallacies of distributed computing. His recent focus has been on Elixir/OTP and building event-driven microservices using Kafka and Elixir. Previously, he was an early employee at HackerRank and a programmer at Goldman Sachs.

Presentations

"Not invented here" syndrome and dark debt: The PagerDuty story Session

Finding the right balance between writing custom in-house software and using an off-the-shelf solution is difficult. Aish Raj Dahal sheds light on the age old build versus buy problem and "not invented here syndrome" by explaining how PagerDuty built a distributed task scheduler and later moved off it to use an off-the-shelf open source solution.

Anil Dash is CEO of Fog Creek Software, the creators of Glitch, the friendly community that helps everyone make the app of their dreams. An entrepreneur and activist, Anil is known as one of the most prominent advocates for a more inclusive and ethical technology industry. Previously, he was an advisor to the Obama White House’s Office of Digital Strategy and a monthly columnist for Wired. Today, he is a board member for Stack Overflow, the Data & Society Research Institute, and the Lower East Side Girls Club and an advisor to startups like Medium and nonprofits like Donors Choose. The New Yorker described him as a “blogging pioneer” for his Webby-recognized personal website, which began in 1999, and for his seminal work in helping create some of the first blogging and social media publishing tools. As a public speaker, Anil has taken the stage at events ranging from the Obama Foundation Summit to the Aspen Ideas Festival to SXSW. He has been a guest on media and podcasts ranging from Vice’s Desus and Mero to Krista Tippet’s On Being, collaborated with Hamilton creator Lin-Manuel Miranda, and created one of the most popular Spotify playlists of 2018. Time named @anildash one of the best accounts on Twitter, and it’s the only account ever retweeted by Bill Gates and Prince, a succinct encapsulation of Anil’s interests.

Presentations

Continuous Disintegration Keynote

As our industry faces its biggest reckoning ever with the social, ethical and cultural impacts of technology, what can we learn if we reflect on the assumptions we build into our systems? How could our processes and tools be designed to undo the biggest bugs and biases of today’s tech?

Jennifer Davis is a cloud operations advocate at Microsoft. Previously, she was a principal site reliability engineer at RealSelf and developed cookbooks to simplify building and managing infrastructure at Chef. Jennifer is the coauthor of Effective DevOps and speaks about DevOps, tech culture, and monitoring. She also gives tutorials on a variety of technical topics. When she’s not working, she enjoys learning to make things and spending quality time with her family.

Presentations

The ops in serverless Session

Rather than a future of NoOps, serverless has increased the need for specialized operations engineering. Jennifer Davis explores the role of operations in serverless, covering testing, monitoring, and debugging functions.

Bart De Vylder is a data scientist at CoScale. Previously, Bart was active in software engineering and architecture, with a focus on distributed systems. His interests lie in machine learning and building reliable, scalable data processing systems. Bart holds a PhD in artificial intelligence from the Free University of Brussels.

Presentations

Kafka Streams in practice: What works and what doesn’t (yet) Session

Bart De Vylder shares his experience migrating an existing codebase and production environment to Kafka Streams, a relatively new and promising streaming library. Join in to see what aspects worked remarkably well and the challenges he ran into along the way.

Rocio Delgado is a senior engineering manager at Slack. Rocio has been a backend engineer, tech lead, and manager for 13+ years. Previously, she was a senior engineering manager at GitHub and GE and a founding engineer at WorkMarket. She’s interested in performance, scalability, resilience, distributed systems, building and growing teams, and diversity and inclusion in tech. Her hobbies are yoga, dancing, and being a karaoke junior performer. She’s a resident of Brooklyn via México.

Presentations

Communicating and managing change Session

Evolving teams and evolving companies are a constant in the career of a leader; helping your team navigate through that change becomes critical to your success as a manager and for the organization. Rocio Delgado shares dos and don'ts for managing and communicating change in your team or organization, which may highlight where your own skills need to evolve.

Jaana B. Dogan is a software engineer at Google, where she works on observability of Go production services. She has a decade of experience building developer platforms and tools.

Presentations

Critical path-driven development Keynote

Scaling large systems and teams is hard. In the recent decade, we finally might have found a critical tool that causes us to believe this doesn't have to be the case. Jaana Dogan explains why Google teaches its tracing tools to "Nooglers" and how it helps them learn about Google-scale systems end to end without getting lost in the world’s largest systems company’s enormous code base.

Bret Fisher is a Virginia Beach-based freelance DevOps and Docker consultant, trainer, speaker, and open source volunteer. Bret has been a cloud and data center ops and system administrator for 20 years. Currently, he helps teams Dockerize their apps and systems and improve their speed of deployment, resiliency, metrics, and awareness (all that DevOps-y stuff). Bret is a Docker Captain and Code for America Brigade Captain. He runs several monthly meetups, speaks at conferences, and is obsessed with containerizing any app he sees. (He’ll likely talk your ear off about it next time you meet.) Bret also develops in Node.js, Bash, and general web, usually for open source projects. In his free time, he does CrossFit, surfs a little, geeks out in the awesome local dev community in Virginia Beach, and travels with his wife. He writes at Bretfisher.com and tweets at @bretfisher.

Presentations

Docker tools and workflows: From app development to production clusters 2-Day Training

Docker Captain Bret Fisher teaches you how to create containers, images, networks, and more using Docker Compose. Join in to practice your DevOps skills with a full day deploying multitier apps on server clusters with Swarm and other tools. This hands-on course covers over 50% of what’s needed for the Docker DCA certification.

Docker tools and workflows: From app development to production clusters (Day 2) Training Day 2

Docker Captain Bret Fisher teaches you how to create containers, images, networks, and more using Docker Compose. Join in to practice your DevOps skills with a full day deploying multitier apps on server clusters with Swarm and other tools. This hands-on course covers over 50% of what’s needed for the Docker DCA certification.

Liz Fong-Jones is a developer advocate, labor and ethics organizer, and site reliability engineer (SRE) with 15+ years of experience at Honeycomb. Previously, she was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights. She lives in Brooklyn with her wife, metamours, and a Samoyed/Golden Retriever mix, and in San Francisco and Seattle with her other partners. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights as a board member of the National Center for Transgender Equality.

Presentations

Building successful site reliability engineering in large enterprises Session

Implementing site reliability (SRE) engineering doesn't have to be intimidating, and it isn't only for cloud-native organizations. Liz Fong-Jones and Dave Rensin share eight key lessons Google's customer reliability engineering team learned helping large enterprises adopt SRE as an operations engineering model.

Jessica Frazelle is a software engineer at Microsoft, where she works with Linux and containers. Jess loves all things involving Linux namespaces and cgroups and is probably most well known for running desktop applications in containers. Jessica has been a maintainer of Docker and a contributor to RunC, Kubernetes, Linux, and Golang, among other projects, and maintained the AppArmor, seccomp, and SELinux bits in Docker. She is quite familiar with locking down containers.

Presentations

Linux, BPF, and containers Session

Jessica Frazelle explores some cool bits of Linux, including BPF and container technologies, and details new ways to trace various things in the kernel and how to even use these traces to hot patch kernels in the case of zero day vulnerabilities. Come for the jokes about Linux; stay for the live demos.

Laurent Gil is a security product strategy architect at Oracle Dyn, an Oracle Cloud Infrastructure global business unit. Previously, Laurent was the cofounder of Zenedge (acquired by Oracle in March 2018) and CEO and cofounder of Ukraine-based Viewdle, which focused on machine learning and computer vision (acquired by Google in 2012). Laurent holds a doctorate honoris causa from the Cybernetic Institute of Ukraine, an MBA from the Wharton School, an MSc in computer science and signal processing from Supélec, a postgraduate degree in management from the Collègedes Ingénieurs in Paris, and a BS in mathematics (summa cum laude) from the University of Bordeaux.

Presentations

API security: What you absolutely need to know now (sponsored by Oracle Dyn) Session

API-based integration is fundamental to business strategy and continued success, but the explosion of APIs is creating incremental security risks that must be addressed. Laurent Gil explains why API security is quickly becoming a key cross-cutting concern for everyone from DevOps to the CISO.

Bot or human? Applying machine learning to combating the bot epidemic (sponsored by Oracle Dyn) Session

Bots now make up over 50% of website traffic and have become the primary source of malicious application attacks, from DDoS to sophisticated intrusions. Laurent Gil lays out what you need to know about bot traffic, different types of bots, and real-world applications of ML and AI to identify and defeat malicious bots.

Securing the edge: Understanding and managing security events (sponsored by Oracle Dyn) Keynote

Laurent Gil shares the latest cybersecurity research findings based on real-world security operations along with innovative approaches to managing and mitigating security events at the cloud edge.

Sébastien Goasguen built his first compute cluster while working on his PhD in the late ‘90s when they were still called Beowulf clusters; he’s been working on making computing a utility since then. He’s been focused on containers and container orchestration, creating a Kubernetes startup Skippbox where he created kompose, Cabin, and kubeless. Active in the serverless community, he cofounded TriggerMesh, a serverless management platform that builds on top of Kubernetes and Knative. He can be found hiking the Jura or at open source conferences. He’s the author of the Docker Cookbook and coauthor of the Kubernetes Cookbook.

Presentations

Certified Kubernetes Application Developer (CKAD) prep + exam 2-Day Training

Can you develop and maintain applications using Kubernetes? That’s the question more employers are asking these days. Take the next step in your career by becoming a Certified Kubernetes Application Developer. You get a full day of test prep from O’Reilly’s top Kubernetes trainer and the opportunity to take the exam on-site, leaving an official Certified Kubernetes Application Developer.

Certified Kubernetes Application Developer (CKAD) prep + exam (Day 2) Training Day 2

Can you develop and maintain applications using Kubernetes? That’s the question more employers are asking these days. Take the next step in your career by becoming a Certified Kubernetes Application Developer. You get a full day of test prep from O’Reilly’s top Kubernetes trainer and the opportunity to take the exam on-site, leaving an official Certified Kubernetes Application Developer.

Opening remarks Event

Sebastien Goasguen welcomes you to Cloud Computing with Kubernetes Day.

Alice Goldfuss is a systems punk currently helping GitHub run its cutting-edge container platform. She loves kernel crashes, memory design, and performance hacks. Alice has consulted on some books, including Docker: Up & Running, Effective DevOps, and Site Reliability Engineering: Volume 2, presented at some conferences, such as SREcon, Velocity, and Container Summit, and run some others, including LISA17 and devopsdays Portland. You can follow her on Twitter, but you’ll probably regret it.

Presentations

Panel discussion: The future of Kubernetes—Challenges and opportunities Cloud Computing with Kubernetes

Join this panel on the future of Kubernetes, as Sarah Wells, Brendan Burns, Kris Nova, and Alice Goldfuss explore upcoming challenges and opportunities.

The container operator's manual Session

Containers can be a great infrastructure solution, but no one should drive them without a manual. Alice Goldfuss discusses some of the advantages and disadvantages of running containers in production at scale.

Ryan Gregg is a product manager at Google responsible for Knative and serverless on Kubernetes. He has over 15 years experience working with developers on building and extending platforms and is passionate about great documentation and reducing developer toil. Previously, he spent more than a decade of working on enterprise software platforms and cloud solutions at Microsoft.

Presentations

Knative: Kubernetes, serverless, and you Cloud Computing with Kubernetes

It's a Kubernetes world. Join Ryan Gregg to learn about Knative, an open source collaboration between Google and other industry leaders to define the future of serverless on Kubernetes. Knative solves the difficult but boring aspects of running modern cloud applications on Kubernetes.

Sam Guckenheimer is the product owner for Azure DevOps at Microsoft, where he acts as the chief customer advocate, responsible for the strategy for the next releases of these products, focusing on DevOps, Agile, and CI/CD pipelines. Sam also curates the website DevOps at Microsoft. Previously, Sam was director of product line strategy at Rational Software Corporation, now the Rational Division of IBM. He is a regular speaker and has given keynote addresses at conferences such as DevOps Enterprise Summit and Agile. He is the author of four books, most recently Journey to Cloud Cadence and Visual Studio Team Foundation Server 2012: Adopting Agile Software Practices—From Backlog to Continuous Feedback. Sam lives in the Seattle area with his wife in a sustainable house they built.

Presentations

60,000 tests in six minutes: Create a reliable pipeline, eliminate flaky tests, and deploy safely but quickly Session

Good test coverage is essential for catching issues before a pull request has been merged, but they have to be the right kind of tests and must be reliable. Drawing on his experience at Microsoft, Sam Guckenheimer details what type of tests to do in your DevOps pipeline, when you should do them, and why.

Arun Gupta is a principal open source technologist at Amazon Web Services. Previously, Arun built and led developer communities at Sun, Oracle, Red Hat, and Couchbase. He has deep expertise leading cross-functional teams to develop and execute strategy and in planning and executing content, marketing campaigns, and programs. He’s also led engineering teams at Sun and was a founding member of the Java EE team. Arun is an avid runner, a globe trotter, a Java Champion, a four-year consecutive JavaOne Rock Star, a JUG leader, NetBeans Dream Team member, and a Docker Captain. He’s authored more than 2,000 blog posts on technology and has given talks in more than 40 countries. He founded the Devoxx4Kids chapter in the US and continues to promote technology education among children. He’s easily accessible at @arungupta.

Presentations

Using chaos to bring resiliency to Kubernetes applications Cloud Computing with Kubernetes

Arun Gupta explains how to use chaos engineering principles for your applications deployed in Kubernetes. Join in to learn about the basic concepts of chaos engineering and explore toolkits that enable these principles. You'll leave with working samples.

Michael Hamrah is the chief architect at Namely, where he’s leading the development of Namely’s platform. A software engineer with more than 15 years of experience, Michael was previously director and principal engineer at Getty Images working on sports, news, and entertainment tools and moved Getty’s Asset Management Platform to the cloud. He was also a senior software engineer at Uber working on metrics and monitoring.

Presentations

Frankenstein's microservices: How to avoid the monster Session

Many companies adopt microservices to break down monoliths, but they soon uncover a hidden cost: How do you manage all these new interconnected things popping up? Michael Hamrah explains how to avoid creating Frankenstein's monster by understanding elements of a microservice platform. . .so you can sleep at night.

Michael Hausenblas is a developer advocate at AWS, part of the container service team, focusing on container security. Michael shares his experience around cloud native infrastructure and apps through demos, blog posts, books, and public speaking engagements as well as contributes to open source software. Previously, was at Red Hat, Mesosphere, MapR, and in two research institutions in Ireland and Austria.

Presentations

Troubleshooting Kubernetes applications Session

Michael Hausenblas walks you through troubleshooting applications running in Kubernetes, from application-level debugging to distributed tracing to chaos engineering.

Patrick Higgins is a UI engineer at Gremlin, where he helps developers unleash the power of controlled chaos. He is passionate about finding effective ways to make UIs resilient to failure. He fills his weekends with playing soccer and assisting with civic causes that he cares about.

Presentations

Chaos engineering bootcamp Tutorial

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow, Ana Medina, and Patrick Higgins lead a hands-on deep dive into chaos engineering, covering the tools and practices you need to implement it in your organization.

Won Jun Jang is an observability engineer at Uber Technologies, working on distributed tracing, monitoring, and performance. In his spare time, he gets lectured by his life coach to write a more interesting Bio to sell himself better.

Presentations

Tracing polyglot systems: An OpenTracing tutorial Tutorial

Priyanka Sharma and Yuri Shkuro demonstrate how distributed tracing works and how to employ it in the development and operations of your applications in the programming language of your choice: Java, Go, Python, Node.js, C#, or C++.

Kavya Joshi is a backend and systems developer at Samsara. Her favorite aspects of being a programmer are reasoning about systems at scale and delving into the inner workings of sophisticated software. When not programming, Kavya tends to spend her time on rock walls and mountaintops.

Presentations

Practical performance theory Keynote

Performance theory offers a rigorous and practical approach to performance tuning and capacity planning. Kavya Joshi dives into elegant results like Little’s law and the Universal Scalability Law. You'll also discover how performance theory is used in real systems at companies like Facebook and learn how to leverage it to prepare your systems for flux and scale.

Michael Kehoe is a site reliability engineer at LinkedIn, where he specializes in building and maintaining reliable, scalable system infrastructure. Previously, he worked with networks at the University of Queensland, built small satellites at NASA, and wrote thermal environments software at Rio Tinto.

Presentations

Monitoring your containers correctly Tutorial

Michael Kehoe walks you through building a small monitoring utility for cgroup containers to illustrate best practices in container monitoring. You'll explore various cgroup constraints and learn how to specifically monitor for each of them to ensure that your application is behaving as expected. Along the way, Michael shares tricks and tips about monitoring containerized applications.

Ameet Kotian is a senior storage operations engineer at Slack, where he is responsible for operating the infrastructure used to store all of Slack’s data. Previously, he was one of the first site reliability engineers at Twitter, where he spent close to five years working on the graph storage service and Twitter’s internal distributed database, Manhattan. His work related to large-scale automated deployments was featured on Twitter’s engineering blog.

Presentations

Smooth scaling: Slack’s journey toward a new database Session

Slack’s rapid growth over the last few years outpaced the original database’s scaling capacity, which negatively impacted the company's customers and engineers. Ameet Kotian explains how a small team of engineers embarked on a journey for the right database solution, which eventually led them to Vitess, an open source cluster database.

Bridget Kromhout is a principal cloud advocate at Microsoft. Her CS degree emphasis was in theory, but she now deals with the concrete (if the cloud can be considered tangible). After 15 years as an operations engineer, Bridget traded being on call for being on a plane. A frequent speaker and program committee member for tech conferences, she leads the Devopsdays organization globally and the DevOps community at home in Minneapolis, Minnesota. She podcasts with Arrested DevOps, blogs at Bridgetkromhout.com, and is active in a Twitterverse near you.

Presentations

Kubernetes 101 Tutorial

Bridget Kromhout walks you through launching clusters and details all the moving parts you need to know about to use Kubernetes in production.

Bridget Lane is a software developer for Gannett and USA Today, where her day-to-day job involves deep-diving into Golang APIs, API management, and cache setup. In her free time, Bridget enjoys cooking, playing board games, and slaying ferocious beasts as a sorcerer in the distant realm of Dungeons and Dragons.

Presentations

From silos to a single pane of glass at USA TODAY NETWORK Session

Three years ago, technical teams at USA TODAY NETWORK were completely siloed, making improvements and troubleshooting difficult and often blind to the rest of the technical organization. Bridget Lane and Kris Vincent explain how drastically the teams' tool belts, thought processes, and goals have changed as the company moved from silos to a single pane of glass.

Maude Lemaire is a San Francisco-based backend engineer at Slack, where she’s working to scale the enterprise product to support some of the world’s largest companies. Maude spends most of her time chasing down people making network calls in a loop, refactoring unwieldy chunks of code, and consolidating redundant database schemas.

Presentations

How to get away with refactoring Session

How do you refactor major, core functionality in a million-line codebase without disrupting the entire system? Maude Lemaire explains how Slack overhauled channels and shares the many obstacles the company overcame to boost both application performance and company-wide developer productivity (with only a few hiccups).

Moishe Lettvin is a software engineer at MailChimp working on backend projects. Moishe has been writing software since the ‘80s, when he worked on DOS TSR written in a combination of Pascal and Assembly. Since then, he’s worked on projects from Microsoft Outlook to Google App Engine. Once upon a time, Moishe made a CRT explode with a software bug.

Presentations

Strategies for better technical interviews Session

Technical interviewing is profoundly important, but unfortunately, it's easy to do poorly and very difficult to do well. Moishe Lettvin outlines strategies for reducing bias and increasing the fidelity of your technical interviews.

Idit Levine is the founder and CEO of solo.io, a Boston-based startup whose mission is to streamline the cloud stack. Solo recently released Squash, an open source platform for debugging microservices applications. Idit has been in the cloud management space for 12 years, working at both enterprise and startup companies. Previously, she was the CTO of the Cloud Management Division at EMC and a member of its global CTO Office, where she and her team introduced successful open source projects for automating unikernels (UniK) and for cross-cluster scheduling (layer-x).

Presentations

Debugging microservices apps via a sevice mesh, OpenTracing, and Squash Session

Idit Levine demonstrates common debugging techniques and offers an overview of Squash, a new tool and methodology that brings the power of modern popular debuggers to developers of microservices apps that run on container orchestration platforms.

Gloo Kubernetes together with Hybrid Infrastructure Cloud Computing with Kubernetes

Gloo is the Envoy-based Ingress, Gateway, and GraphQL platform for glueing together infrastructure from any stack. Come watch us use Gloo to combine kubernetes microservices, serverless functions, and a legacy application together into a single application and debug it.

Richard Li is cofounder and CEO of Datawire. Datawire supports several popular open source tools for Kubernetes, including Telepresence (a local development on Kubernetes) and the Ambassador API Gateway. Richard is a veteran of multiple technology startups including Duo Security, Rapid7, and Red Hat. He is a recognized Kubernetes and microservices expert and has spoken at numerous conferences including ApacheCon, the Microservices Practitioner Summit, and O’Reilly Velocity. He holds both a BS and MEng in computer science from MIT.

Presentations

The simply complex task of implementing Kubernetes ingress: Lessons learned Cloud Computing with Kubernetes

Getting traffic into a Kubernetes cluster should be simple, but it’s not. The range of options can be confusing, and implementing effective configuration is equally challenging. Richard Li discusses the evolution of ingress on Kubernetes, explains why ingress controllers aren’t necessarily the best approach, and shares a series of lessons learned about managing traffic ingress.

Bryan Liles is an engineer at Heptio. When he is not writing software to help move teams to Kubernetes, he gets to speak at conferences on topics ranging from machine learning to building the next generation of developers. In his free time, Bryan races cars in straight lines and around turns and builds robots and devices.

Presentations

Building continuous delivery with Kubernetes as opposed to installing one on Kubernetes

With the introduction of Kubernetes to your organization comes the ability to implement new patterns enabled by the new platform. Bryan Liles explains how you can use Kubernetes's rich set of APIs to build a continuous delivery pipeline in the platform instead of on top it. Now you can use your Kubernetes tool chest to control how you get software to production.

Roger Magoulas is the vice president of O’Reilly Radar. Previously, Roger was the research director at O’Reilly, where he and his team built the company’s analysis infrastructure and provided analytic services and insights on technology-adoption trends to business decision makers at O’Reilly and beyond. He and his team found what excites key innovators and use those insights to gather and analyze faint signals from various sources to make sense of what others may adopt and why.​

Presentations

O’Reilly Radar: Open source tool trends—What our users tell us Keynote

Using aggregate analysis of O'Reilly Safari usage and search data, Roger Magoulas shares key insights and trends that are impacting the open source tools ecosystem—trends you can use to help make decisions that affect your next project, your organization’s strategic direction, and your own career.

Nikki McDonald is a content director at O’Reilly Media, where she writes, edits, and works with the industry’s leading practitioners to develop books, online courses, and training videos to help engineers and developers collaborate more effectively and create and deploy complex distributed systems. She also cochairs O’Reilly’s Velocity Conference, held annually in San Jose, New York, and London. Nikki started out as a features editor at MacUser magazine back when people were still dialing up to the internet with AOL. She lives in Ann Arbor, MI.

Presentations

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the second day of keynotes.

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the first day of keynotes.

Tuesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Jessica McKellar is a founder and the CTO of Pilot, a bookkeeping firm powered by software. Previously, she was a founder and the vice president of engineering for a real-time collaboration startup acquired by Dropbox, where she then served as a director of engineering. Before that, she was a computer nerd at MIT who joined her friends at Ksplice, a company building a service for rebootless kernel updates on Linux (acquired by Oracle). Jessica is a former director for the Python Software Foundation and PyCon North America’s Diversity Outreach Chair. For her outreach efforts in the Python community, she was awarded the O’Reilly Open Source Award in 2013.

Presentations

The programmer's mind Keynote

The programmer's mind is inherently an activist's mind. Jessica McKellar draws parallels between the free and open source software movement and the work to end mass incarceration.

Ana Medina is a San Francisco-based chaos engineer at Gremlin, where she helps companies avoid outages by running proactive chaos engineering experiments. Previously, she was an engineer on the SRE and infrastructure teams at Uber, specifically focusing on chaos engineering and cloud computing. She tweets at @Ana_M_Medina, mostly about traveling, diversity in tech, and mental health.

Presentations

Chaos engineering bootcamp Tutorial

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow, Ana Medina, and Patrick Higgins lead a hands-on deep dive into chaos engineering, covering the tools and practices you need to implement it in your organization.

James Meickle is a site reliability engineer at Quantopian, a Boston startup making algorithmic trading accessible to everyone. In past roles, he’s been responsible for processing MRI scans at the Center for Brain Science at Harvard University, sales engineering and developer evangelism at AppNeta, and release engineering during the Romney for President 2012 campaign. Between NYSE trading days, he advises devopsdays Boston and conducts Ansible trainings on O’Reilly’s Safari platform. What free time remains is dedicated to cooking, sci-fi, permadeath video games, and Satanism.

Presentations

Ansible for SRE teams Tutorial

Ansible is a "batteries included" automation, configuration management, and orchestration tool that's fast to learn and flexible enough for any architecture. Join James Meickle to get started with Ansible, with an eye toward sustainable development in cloud environments.

Sell cron, buy Airflow: Modern data pipelines in finance Session

Quantopian integrates financial data from vendors around the globe. As the scope of its operations outgrew cron, the company turned to Apache Airflow, a distributed scheduler and task executor. James Meickle explains how in less than six months, Quantopian was able to rearchitect brittle crontabs into resilient, recoverable pipelines defined in code to which anyone could contribute.

Daniel Mennell helps Mesosphere customers realize the benefits of cloud computing, whether public, private, or hybrid in nature. Dan has nearly 20 years of experience helping organizations big and small overcome development and operational IT challenges. Whether it is managing networks, servers, and the applications that run on them or shortening a continuous delivery pipeline, Dan has been involved.

Presentations

Zero to Kubernetes in five minutes (sponsored by Mesosphere) Session

Getting Kubernetes up and running is only half the battle. Now you need to get the supporting infrastructure in place. Dan Mennell shares a templated approach to deploying what is needed to get started with source control, CI/CD, and monitoring with Prometheus, along with other things.

Anubhav Mishra is a developer advocate at HashiCorp. He created Atlantis—an open source project that helps teams collaborate on infrastructure using Terraform. Previously, he worked at Hootsuite, where he built distributed systems and a microservice delivery platform. Anubhav loves open source software and is continuously finding ways to contribute to projects that excite him and helping developers and operators do better. That has led him to contribute to Virtual Kubelet and Helm (Cloud Native Computing Foundation (CNCF) projects). In his free time, he DJs, makes music, and plays football. He’s a huge Manchester United supporter.

Presentations

Smart networking with service meshes Tutorial

Over the past year, service meshes have gained significant interest. Most service meshes have two components: a control plane and a data plane. Anubhav Mishra explains what it takes to build a scalable control and data plane. Anubhav also discusses how HashiCorp Consul provides many features like a distributed key-value store and service discovery that make it ideal for a control plane.

Christian Monaghan is the infrastructure lead for HealthCare.gov’s core systems and the cofounder of Nava PBC, a startup working across numerous federal agencies to radically improve how government serves people through technology. He lives in Washington, DC.

Presentations

Lessons learned migrating HealthCare.gov to Terraform Session

Christian Monaghan explains how he and his team successfully migrated HealthCare.gov, America's largest government website, to the cloud infrastructure provisioning tool Terraform, shares lessons learned along the way, and details how you can effectively use Terraform for your next project.

Aviran Mordo is the VP of engineering at Wix. In his 20+ years in the software industry, Aviran has held a number of engineering roles and leading positions, from designing and building the US National Archives Electronic Records Archives prototype to building large search engine infrastructures. Aviran has vast knowledge of internet technologies, software development, and continuous delivery. He is a technology blogger as well as a dev-centric culture advocate.

Presentations

You've been arrested by the CAP; you have the right to remain consistent. Session

Aviran Mordo discusses the challenges and real-life use cases of handling data in a distributed environment.

Effie Mouzeli is a site reliability engineer at the Wikimedia Foundation, where she’s one of the newer members of the SRE team. She studied physics and scientific computing but decided to follow neither. Instead she became a sysadmin, later a systems engineer, now an SRE. She worked in a number of startups and small organizations where her responsibilities were usually automation, infrastructure architecture, and working closely with developers. Away from work, she loves camping, concerts, and dressmaking.

Presentations

Small-scale engineering Session

Effie Mouzeli explains why small-scale engineering is just as challenging as large-scale engineering and offers ideas on how to survive technical debt, poor communication, and other everyday challenges.

Leemay Nassery is a senior engineer leading the recommendations and targeting engineering efforts at Comcast. She also sets the strategic direction for content personalization for Comcast’s Xfinity consumer-facing video products and leads efforts with A/B testing, testing and targeting, and producing the metrics to measure successful customer outcomes.

Presentations

Migrating a recommendations platform from bare metal to the cloud Session

Leemay Nassery discusses the importance of data collection pipelines and explains how to efficiently store datasets with the intention of making them easily accessible by a downstream machine learning platform.

Mike Newswanger is a cloud engineer at Elastic. Previously, he was a site reliability engineer at Stack Overflow. Mike enjoys solving problems, particularly those related to scale, and tackling the challenges of creating secure, performant, resilient, and maintainable code and infrastructure. Over his career, he’s worked in all parts of the application development lifecycle, from inception to deployment.

Presentations

Bulk image processing using Kubernetes Session

Mike Newswanger explains how he used Kubernetes and Google Cloud to burst and extend the capacity of a physical infrastructure for optimizing almost 10 million images in less than two weeks.

Amy Nguyen is a software engineer working on global economic infrastructure at Stripe. Outside of work, Amy writes about the tech industry, loves baking, and reads too many self-improvement books.

Presentations

How to break up with your vendor Session

You're unsatisfied with one of your monitoring providers. You've considered finding a new solution, but the thought of migrating your data off their platform sounds extremely painful. Amy Nguyen and Cory Watson explain how to make a deadline for an infrastructure-critical software migration while ensuring that everyone's requirements are met and no data has been lost.

Victoria Nguyen is a network systems engineer at Fastly. She loves rock climbing and Halloween.

Presentations

Networks, echolocation, and fish GIFs Session

Victoria Nguyen explains how Fastly overhauled the monitoring and data collection of its globally distributed network without its caches noticing.

Tuli Nivas is a principal performance engineer at Salesforce with extensive experience in design and implementation of test automation and monitoring frameworks. Her interests lie in software testing, cloud computing, big data analytics, systems engineering, and architecture. Tuli holds a PhD in computer science with a focus on building processes to set up robust and fault-tolerant performance engineering systems.

Presentations

Performance anomaly detection at scale (sponsored by Salesforce) Session

Automated anomaly detection in production using simple data science techniques enables you to more quickly identify an issue and reduce the time it takes to get customers out of an outage. Tuli Nivas shows how to apply simple statistics to change how performance data is viewed and how to easily and effectively identify issues in production.

Kris Nova is independent, focusing on containers, infrastructure, and Kubernetes, and she’s an ambassador for the Cloud Native Computing Foundation. Previously, she was a developer advocate and an engineer on Kubernetes at Heptio. Kris has a deep technical background in the Go programming language and has authored many successful open source tools in Go. She’s a Kubernetes maintainer and the creator of kubicorn, a successful Kubernetes infrastructure management tool. Kris organizes a special interest group in Kubernetes and is a leader in the community. She understands the grievances with running cloud native infrastructure via a distributed cloud native application and recently authored an O’Reilly book on the topic, Cloud Native Infrastructure. Kris lives in Seattle and spends her free time climbing mountains.

Presentations

Moving an enterprise monolith to Kubernetes Session

Kris Nova tells the true and painful story of what it's like to move a monolithic enterprise app to running in a container in Kubernetes. Kris then prototypes a production environment that is designed to be as hard as possible to containerize and liberates the application into a scalable and modern cloud-native environment.

Panel discussion: The future of Kubernetes—Challenges and opportunities Cloud Computing with Kubernetes

Join this panel on the future of Kubernetes, as Sarah Wells, Brendan Burns, Kris Nova, and Alice Goldfuss explore upcoming challenges and opportunities.

Ian Nowland is the SVP, Engineering Manager of Compute Platform at Two Sigma. He leads the teams building the platforms to migrate Two Sigma from large scale on-premise infrastructure, to a Kubernetes-based Cloud-Native elastic public-cloud infrastructure. Before that he founded and managed the EC2 Nitro team at AWS, building up that public-cloud infrastructure

Presentations

Managing by missing Session

Ian Nowland challenges you to think broadly about the meaning of a "miss", and explains how a philosophy of owning and learning from them allows you to avoid more in the future. This enables you to grow as a manager, and so grow your impact on your organization.

Heather Osborn is a senior director of systems infrastructure at Ticketmaster. Heather has been a system and operations engineer for the last 25 years. Although not common in the tech world, she’s stayed with Ticketmaster for the last 20 years through the company’s various incarnations, partly because of multiple technology reinventions and unique challenges and partly because she wants to see what will happen next. She’s looking forward to this new era of public cloud and container orchestration. Heather is an avid long-distance runner who has lots of time to think about these things while pounding the pavement.

Presentations

Archaic to orchestrated: Ticketmaster's hybrid DevOps transformation Session

Heather Osborn explains how Ticketmaster moved from a siloed on-premises environment to a DevOps hybrid cloud. If a company whose technology and human infrastructure have grown up organically around a custom-written VAX operating system can make the move to public cloud-native applications and begin a rapid march to a hybrid cloud solution, so can you.

Jérôme Petazzoni is a DevOps advocate and international speaker. He was born and raised in France, where he worked on geographic information systems, voice over IP, video streaming, and encoding and started a cloud hosting company back when EC2 wasn’t an Amazon product yet. In California he built and scaled the dotCloud PaaS, which eventually gave birth to Docker. While at Docker, he represented the company at hundreds of conferences and events and trained thousands of engineers to use Docker, Swarm, and Kubernetes. He’s fluent in many languages (mostly programming ones), owns a dozen musical instruments, and can play the theme of Zelda on most of them.

Presentations

Kubernetes bootcamp: Deploying and scaling microservices 2-Day Training

Kubernetes has a reputation for being complex to set up and operate, but that doesn't have to be the case. Join Jérôme Petazzoni to explore Kubernetes concepts and architecture and learn how to use it to deploy and scale your applications. The content is suitable to all kinds of deployment models, from the cloud (AKS, EKS, GKE, kops, etc.) to on-premises.

Kubernetes bootcamp: Deploying and scaling microservices (Day 2) Training Day 2

Kubernetes has a reputation for being complex to set up and operate, but that doesn't have to be the case. Join Jérôme Petazzoni to explore Kubernetes concepts and architecture and learn how to use it to deploy and scale your applications. The content is suitable to all kinds of deployment models, from the cloud (AKS, EKS, GKE, kops, etc.) to on-premises.

Neil Peterson is a senior content engineer at Microsoft, where he delivers technical documentation and samples with a focus on Azure and containers. A data center and cloud enthusiast, Neil has 15 years’ experience in large data center deployment, management, and maintenance operations.

Presentations

Consuming cloud services with the Kubernetes Service Catalog Session

Neil Peterson leads a technical deep dive into using the Kubernetes Service Catalog to dynamically provision and consume managed cloud services.

Guy Podjarny is Snyk’s co-founder and CEO, focusing on using open source and staying secure. Guy was previously CTO at Akamai following their acquisition of his startup, Blaze.io, and worked on the first web app firewall & security code analyzer. Guy is a frequent conference speaker & the author of O’Reilly “Securing Open Source Libraries”, "Responsive & Fast” and “High Performance Images”.

Presentations

Securing serverless by breaking in Session

Serverless shuffles security priorities, naturally mitigating certain risks while elevating others, as this live hacking session vividly demonstrates. Guy Podjarny breaks into a vulnerable demo serverless app while explaining each security mistake, its impact, and how it can be avoided. You'll leave knowing why you need to keep your functions secure and how to do it yourself.

Prithvi Raj is an observability engineer working on Uber’s distributed tracing system, Jaeger.

Presentations

Tracing polyglot systems: An OpenTracing tutorial Tutorial

Priyanka Sharma and Yuri Shkuro demonstrate how distributed tracing works and how to employ it in the development and operations of your applications in the programming language of your choice: Java, Go, Python, Node.js, C#, or C++.

Alex Rasmussen is the vice president of engineering at Freenome, an AI genomics company with a unique approach to detecting cancer at its earliest stages and helping physicians optimize the next generation of precision therapies. He holds a PhD from the University of California, San Diego, where his dissertation focused on highly efficient large-scale data processing systems. While at UCSD, he led the TritonSort project, which set several world records in large-scale sorting.

Presentations

How do we solve the world's spreadsheet problem? Session

In the past five years, Alexander Rasmussen has spent a lot of time trying to get high-integrity data out of spreadsheets and into databases. Alexander explores common data integrity problems when dealing with spreadsheet data, investigates whether those integrity problems are inescapable, and shares ongoing work to mitigate them.

Dave Rensin is the director of customer reliability engineering (CRE) at Google. His team takes Google SREs focused on the reliability and availability of internal Google systems and focuses them on the reliability and availability of customer production systems running on Google Cloud. His mission is to teach Google customers how to design, build, and run highly available systems using Google SRE practices and tools. Dave is the author of several books, including two for O’Reilly, and holds more than a dozen patents in distributed systems, data acquisition, access control, and pattern matching.

Presentations

Building successful site reliability engineering in large enterprises Session

Implementing site reliability (SRE) engineering doesn't have to be intimidating, and it isn't only for cloud-native organizations. Liz Fong-Jones and Dave Rensin share eight key lessons Google's customer reliability engineering team learned helping large enterprises adopt SRE as an operations engineering model.

How do DevOps and SRE relate? Hint: They're best friends. (sponsored by Google Cloud) Keynote

SRE has exploded in the industry over the last two years, with the publication of two best-selling books from Google. Not surprisingly, there have been questions about how SRE and DevOps relate. Do they compete? Do they reinforce each other? The short answer is that they make each other better. Join Dave Rensin to hear why.

Liz Rice is the technology evangelist at container security specialists Aqua Security and coauthor of the O’Reilly report Kubernetes Security. She has a wealth of software development, team, and product management experience from her years spent working on network protocols and distributed systems and in digital technology sectors such as video on demand (VOD), music, and voice over internet protocol (VoIP). When not building startups and writing code, Liz loves riding bikes in places with better weather than her native London or racing in virtual reality on Zwift.

Presentations

A programmer's guide to secure connections Session

Beyond looking out for a little green padlock in the browser bar, what do you need to know about secure connections as a programmer? What do people mean by terms like authentication, verifying a certificate, or signing a message? Join Liz Rice as she demystifies HTTPS, TLS, X.509, and more.

Matt Rogish is cofounder and CEO of ReactiveOps, a DevOps-as-a-service and consulting company, where he’s led growth from zero to 20 people (and growing), built a company, product, and strategy, managed P&L, and (so far) has kept it from cratering into the ground. Matt has been a Ruby and Rails developer since 2006. Previously, he was CTO of several startups (all using Rails), including most recently, Rails Machine, a Rails hosting company.

Presentations

How NTSB air disaster analysis can help you in an emergency Session

Matt Rogish explains how NTSB investigations of air disasters have dramatically improved flight safety and applies lessons learned in disaster recovery and analysis, teamwork, task saturation, and systems design to modern software application and infrastructure architecture at scale to achieve higher availability, reduced errors, and more scalable systems.

Casey Rosenthal is cofounder and CEO at Verica.io. Previously, he was CTO at Backplane and an executive manager and senior architect, where he managed teams tackling big data, architected solutions to difficult problems, and trained others to do the same. He seeks opportunities to leverage his experience with distributed systems, artificial intelligence, translating novel algorithms and academia into working models, and selling a vision of the possible to clients and colleagues alike. For fun, Casey models human behavior using personality profiles in Ruby, Erlang, Elixir, Prolog, and Scala.

Presentations

Deprecating simplicity Session

Join Casey Rosenthal to learn how to use chaos engineering to embrace complexity and navigate it rather than reject complexity and try to erase it.

James Royalty is a lead engineer at NS1.

Presentations

Rebuilding the airplane in flight. . .safely Session

Rewriting the key software component of your platform from scratch is always intimidating. Shannon Weyrick and James Royalty discuss NS1's recent DNS server rewrite and outline the steps the company took to roll it out across its globally distributed network with no downtime.

Deepank Sharma is a senior member of the technical staff at Verizon Wireless, where he is working on a strategy to seamlessly and efficiently migrate on-premises applications to the cloud. Previously, Deepank was an architect migrating applications from monolithic to microservices architecture and on various ecommerce platforms at Verizon FiOS and Time Warner Cable. He is deeply interested in the open source community and likes creating software that makes developers’ lives as easy as possible so they can focus on developing and delivering business logic. Deepank loves to learn new technology and spend time with his wife and kid. He is a hardcore NBA fan and supports the Golden State Warriors.

Presentations

Easy CI/CD done right (sponsored by Verizon) Session

Deepank Sharma offers an end-to-end look at continuous integration and continuous delivery (CI/CD), using Pipeline as Code and templates, quality gates, Docker, and Kubernetes while following best DevOps practices.

Matt Sheppard is a manager if internal DNS systems at Oracle Dyn. Matt has been a leader and manager of developers, application architects, software engineers, and database administrators and a project manager in several industries. He has an education i engineering and frontline experience. Matt is interested in hard problems and uncharted territory.

Presentations

Go fly a kite

Construction on the first bridge across Niagara Falls began by flying a kite to get a string to the other side, which was used to pull a rope—and eventually, other materials—across. Matt Sheppard applies this pattern to software, establishing a thread through a proposed system, building just enough infrastructure to execute the simplest operation possible, and iterating from there.

Yuri Shkuro is a software engineer at Uber, working on distributed tracing, reliability, and performance. Yuri is the coauthor of the OpenTracing standard (a CNCF project) and the creator of Jaeger, Uber’s open source distributed tracing system (also a CNCF project).

Presentations

Tracing polyglot systems: An OpenTracing tutorial Tutorial

Priyanka Sharma and Yuri Shkuro demonstrate how distributed tracing works and how to employ it in the development and operations of your applications in the programming language of your choice: Java, Go, Python, Node.js, C#, or C++.

Ines Sombra is director of engineering at Fastly, where she spends her time helping the web go faster. Ines holds an MS in computology with an emphasis on cheesy ’80s rock ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life, she was a data engineer.

Presentations

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the second day of keynotes.

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the first day of keynotes.

Tuesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Sharon Steed is the founder and CEO of Communilogue, a consultancy that helps companies communicate better with their audience as well as their fellow team members through workshops, one-on-one coaching, and the online community at Communilogue.com. Sharon is a subject-matter expert on communicating with empathy. A lifelong stutterer, she uses her speech impediment to teach both what empathy is and how to be empathetic. Sharon’s course Communicating with Empathy is available on both LinkedIn Learning and Lynda.com. She has spoken at companies on improving team communication and collaboration and at tech conferences on vulnerability as an asset; she’s also given a TEDx talk on empowering insecurities. Sharon lives in Pittsburgh, PA.

Presentations

Creating cultures of empathy Session

Sharon Steed explains what empathy is (and what it isn't) and gives you the tools you need to cultivate an empathy mindset at work and in life.

Dave Stenglein is senior vice president of architecture and engineering at Kenzan, where he leads teams building cloud-native, robust, scalable, and secure systems. In his 23 years in technology, Dave has worked in a range of industries including internet, finance, cable, and media in roles from sysadmin to developer and architect.

Presentations

Making CI/CD accessible (sponsored by Kenzan) Session

Dave Stenglein demonstrates how to get up and running with Kubernetes, Spinnaker, and continuous delivery in hours, not days.

James Thompson is a principal software engineer at Mavenlink, where he is committed to helping engineering teams become more deliberate in how they build software through developing strong learning cultures, principled engineering practices, and holistic architectural thinking. He has worked with web technologies since 2003.

Presentations

Beyond accidental architecture Session

Accidental architecture is a product of circumstances rather than deliberate development toward a goal. James Thompson explains why it's best addressed by equipping teams to make more deliberate and informed technical decisions.

Laura Thomson is senior director of engineering for Firefox engineering operations at Mozilla and serves on the board of ISRG, the nonprofit behind Let’s Encrypt. Laura has held a number of engineering roles in her decade at Mozilla. Previously, she was principal and vice president at OmniTI as well as an engineer, consultant, and computer science academic. She has written several books on various open source technologies, including PHP and MySQL Web Development (with Luke Welling, her husband). Laura is from Melbourne, Australia, but now lives with her family on a rustic horse farm in Maryland, where she relaxes by indulging in manual labor and other rural pursuits.

Presentations

Practical ethics Keynote

Laura Thomson shares Mozilla's approach to data ethics, review, and stewardship, including practical open source guidelines for lean data and examples of when the company has gotten it right (and wrong).

James Turnbull is VPE at Glitch. A longtime member of the open source community, James is the author of a number of books about open source software. Previously, he was a CTO in residence at Microsoft, founder and chief technology officer at Empatico and Kickstarter, VPE of Venmo, and an adviser at Docker. James likes food, wine, books, photography, and cats. He is not overly keen on long walks on the beach or holding hands.

Presentations

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the second day of keynotes.

Closing remarks Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra close the first day of keynotes.

Tuesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Cochairs Nikki McDonald, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Jeff Valeo is a lead site reliability engineer on the cloud infrastructure team at Grubhub. Previously, Jeff was a technical lead at Apple and an engineer at Google.

Presentations

Microreleases: How to safely rollout complex changes at scale Session

Jeffrey Valeo explains how to safely rollout complex changes at scale.

Seth Vargo is an engineer at Google Cloud. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.

Presentations

Modern security best practices for microservices and distributed systems Session

Seth Vargo outlines the key principles for securing microservices and distributed systems in the modern world, where applications run in cloud or hybrid cloud infrastructure.

Kris Vincent manages the developer solutions team at Gannett for USA TODAY NETWORK. A man of many hats, Kris has been everything from an IT director of a local nonprofit to a DevOps engineer for a cloud-based telephony company. He’s a husband, father, Go developer, DevOps sorcerer, and coffee enthusiast, although not necessarily in that order. In his professional life, he’s passionate about automation, clean code, and comfortable collaborative work environments. Outside of work, he’s passionate about his wife and kids, rock climbing, off-roading, politics, STEAM educational programs, comic book superheroes, and archery.

Presentations

From silos to a single pane of glass at USA TODAY NETWORK Session

Three years ago, technical teams at USA TODAY NETWORK were completely siloed, making improvements and troubleshooting difficult and often blind to the rest of the technical organization. Bridget Lane and Kris Vincent explain how drastically the teams' tool belts, thought processes, and goals have changed as the company moved from silos to a single pane of glass.

Heidi Waterhouse is a developer advocate with LaunchDarkly. She delights in working at the intersection of usability, risk reduction, and cutting-edge technology. One of her favorite hobbies is talking to developers about things they already knew but had never thought of that way before. She sews all her conference dresses so that she’s sure there is a pocket for the mic.

Presentations

Disaster resilience the Waffle House way, from flattops to feature flags and more Session

Waffle House's hurricane disaster plan has everything you could want from an IT disaster plan, including contact trees, failover states, and runbooks on partial operation. Heidi Waterhouse shares lessons about state drawn from the world outside computers and explains how to quantify them using a finite state machine and implement them automatically while you are in a less-than-perfect condition.

Cory Watson leads the observability team at Stripe, helping engineers be more confident in their work so they can ship reliable, safe, and performant features for Stripe’s products. Cory has spent over 20 years as a leader, software engineer, SRE, and OSS contributor. Previously, he managed the observability team at Twitter.

Presentations

How to break up with your vendor Session

You're unsatisfied with one of your monitoring providers. You've considered finding a new solution, but the thought of migrating your data off their platform sounds extremely painful. Amy Nguyen and Cory Watson explain how to make a deadline for an infrastructure-critical software migration while ensuring that everyone's requirements are met and no data has been lost.

Sarah Wells is the technical director for operations and reliability at the Financial Times. Her teams build operational and developer tooling and help engineering teams at the FT to support the systems they build, including coordination, communication and learning around major incidents. Previously, Sarah was a developer and tech lead for nearly 20 years. Building a new microservices-based system about five years ago led her to develop a deep interest in operability, observability, and DevOps—and learn a lot about containerization, Kubernetes, and Go in the process.

Presentations

Panel discussion: The future of Kubernetes—Challenges and opportunities Cloud Computing with Kubernetes

Join this panel on the future of Kubernetes, as Sarah Wells, Brendan Burns, Kris Nova, and Alice Goldfuss explore upcoming challenges and opportunities.

Switching horses midstream: The challenges of migrating 150+ microservices to Kubernetes Session

The Financial Times recently migrated its content platform to Kubernetes. Join Sarah Wells to find out what it takes to migrate 150+ microservices from one container stack to another without affecting the existing production users and while the rest of your teams are working on delivering new functionality.

James Wen is a site reliability engineer at Spotify, where he’s currently focused on revamping Spotify’s infrastructure and adopting Kubernetes. Previously, James was the team lead (anchor) of the Cloud Foundry buildpacks team at Pivotal and a core contributor and maintainer of Bundler. James has spoken about buildpacks at Cloud Foundry Summit Europe and Spotify’s infrastructure journey and migrations at QCon New York, KubeCon Europe, and DevfestDC. He loves climbing, whether it’s on gorgeous Fontainebleau slopers or nasty plastic crimps.

Presentations

Migrating Spotify's runtime to Kubernetes

Spotify recently completed the migration of all services from running on bare-metal hardware to hosts in the cloud on GCP. Spotify is now in the exciting process of journeying from merely cloud hosted to cloud native via migrating the running of services to Kubernetes. James Wen discusses the work involved, lessons learned, and pitfalls encountered in moving services onto Kubernetes.

Shannon Weyrick is vice president of architecture at NS1. A 20-year veteran of internet infrastructure, Shannon is an accomplished technical architect, developer, and leader whose experience encompasses both development and operations of globally distributed platforms. Previously, Shannon worked at INAP and F5. A regular open source contributor, he has led and worked on a wide range of infrastructure projects from high-performance servers to novel programming languages and runtimes, and he enjoys writing and speaking at industry conferences.

Presentations

Rebuilding the airplane in flight. . .safely Session

Rewriting the key software component of your platform from scratch is always intimidating. Shannon Weyrick and James Royalty discuss NS1's recent DNS server rewrite and outline the steps the company took to roll it out across its globally distributed network with no downtime.

Jamie Wilkinson is a site reliability engineer at Google. He’s a contributing author to the SRE Book and has presented on contemporary topics at prominent conferences such as Linux.conf.au, Monitorama, PuppetConf, Velocity, and SRECon. His interests began in monitoring and the automation of small installations and have continued with human factors in automation and systems maintenance on large systems. Despite his more than 15 years in the industry, he’s still trying to automate himself out of a job.

Presentations

SLO burn Session

Jamie Wilkinson offers a brief overview of SLOs, shares a practical guide to implementing sustainable SLO-based alerting for systems of any size, and outlines the tooling required to supplement the system in the absence of cause-based alerting.

Adam Wolfe Gordon is a software engineer on the storage team at DigitalOcean, working primarily on block storage orchestration. He likes building elegant microservices, continuous deployment, and occasional forays into low-level software such as ceph and qemu.

Presentations

Managing multiple sources of truth in distributed applications Session

When building distributed applications, it's highly desirable to maintain a single source of truth, such as a database, for all application state. Unfortunately, for some applications, multiple sources of truth are unavoidable. Adam Wolfe Gordon shares strategies, learned from real-world experience, for managing multiple sources of truth without sacrificing consistency and usability.

Jason Yee is a technical evangelist at Datadog, where he works to inspire developers and ops engineers with the power of metrics and monitoring. Previously, he was the community manager for DevOps and performance at O’Reilly Media and a software engineer at MongoDB. He’s currently exploring the world while living as a nomad and would love to hear about the part of the world that you call home.

Presentations

Canary deploys with Kubernetes and Istio Session

Jason Yee shows how you can more easily test code in production while isolating the effect of potential issues using container orchestration and services meshes.