Engineer for the future of Cloud
June 10-13, 2019
San Jose, CA

Speakers

Hear from innovative programmers, talented managers, and senior developers who are doing amazing things with cloud native and distributed systems. More speakers will be announced; please check back for updates.

Yaniv Aknin is Google Cloud Platform’s lead for quantitative reliability. He works with product managers, developers, and fellow SREs to create availability and performance metrics that accurately model customers’ experience, then optimizes those metrics toward the right reliability/cost point. He’s been an SRE with Google since 2013, working on network infrastructure and several parts of the Google Cloud Platform. He has over two decades’ experience solving business problems in corporate, early startup, government, and nonprofit organizations. Outside of work, he enjoys travel, food, improv theater, and pop-sci, especially behavioral economics.

Presentations

The SRE I aspire to be Keynote

Yaniv Aknin dives into the secret sauce for a successful SRE organization: high-quality measurements of reliability. He explains why measuring reliability is crucial (and why it’s so hard), shares a couple of tips for getting it right, and explores why it’s the key differentiator between SRE and DevOps.

Sean T. Allen is vice president of engineering at Wallaroo Labs and a member of the Pony core team. His turn-ons include programming languages, distributed computing, Hiwatt amplifiers, and Fender Telecasters. His turn-offs include mayonnaise, stirring yogurt, and sloppy code. He’s one of the authors of Storm Applied.

Presentations

Data-corrupting architectures we know and love Session

If we can't seem to get single-process shared data access right, Sean Allen asks, what chance do we have when we use distributed state?

Kyle Anderson is an engineer at Yelp. He’s been building systems to be proud of for more than 10 years.

Presentations

Layers Session

Are we building the right abstraction layers? And how would we know? To answer these questions, Kyle Anderson looks at the past, present, and future of the abstraction layers we've built as an industry.

Dave “Bear” Andrews is chief architect at Verizon Digital Media Services, overseeing the evolution of the Edgecast content delivery network (CDN) and Uplynk video platform. He enjoys low-level security exploitation techniques and has an appreciation for the nuances and resulting surprised faces that accompany discovering failure modes in globally distributed systems. Previously, Dave brought several web security products to market at Verizon Digital Media Services and worked for startups in the Los Angeles area, building security products in the virtualization and CDN spaces. He holds a PhD in computer security from a small university in Australia.

Presentations

Which edge do you need: Managing multiple edges to deliver the next industrial revolution (sponsored by Verizon Digital Media Services) Keynote

Dave Andrews sheds light on how the edge landscape has been—and is still—evolving with a look at the new class of low-latency/high-bandwidth application domains and how Verizon Digital Media Services is helping to deliver this to its customers.

Kolton is the founder of Gremlin – helping companies build more robust services. He was a Chaos Engineer at Netflix, focused on the resilience of the Edge services. He designed and built FIT: Netflix’s failure injection service. Prior he improved the performance and reliability of the Amazon Retail website. At both companies he has served as a ‘Call Leader’, managing the resolution of company-wide incidents. Kolton is a father of 5. He is passionate about building resilient systems, primarily as it lets him break things for fun and profit.

Presentations

Chaos engineering: When the network breaks Session

Join Tammy Butow to learn how to use chaos engineering to accelerate your understanding of how your network can break (packet loss, black hole attacks, latency injection, and packet corruption) and impact your services.

Ricardo Aravena is an infrastructure manager at Rakuten, helping automate everything in containers using open source and lately contributing to the Kata Containers project. He’s been working in tech for more than 19 years and comes from a diverse professional background, including roles at large companies such as Cisco and VMware as well as startups such as Coupa, Hytrust, Exablox, and SnapLogic. Most recently he he spent two years at Branch Metrics working on automating the company’s cloud infrastructure to handle millions of requests and petabytes of data on a daily basis.

Presentations

Untrusted? No problem: A story on the latest Kubernetes container sandbox mechanisms Session

The last two years have seen the emergence of several mechanisms to isolate workloads in containers as well as Kubernetes's ability to run these in a single multitenant cluster. Ricardo Aravena explores the pros and cons and explains how users can benefit from them.

David Aronchick leads open source machine learning strategy at Azure. He spends most of his time helping humans convince machines to be smarter. (He’s only moderately successful at this.) Previously, he led product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and cofounded the Kubeflow project. He’s also worked at Microsoft, Amazon, and Chef and cofounded three startups. When not spending too much time in service of electrons, he can be found on a mountain (on skis), traveling the world (via restaurants), or participating in kid activities, of which there are a lot more than he remembers when he was that age.

Presentations

How to adopt cloud native machine learning with Kubernetes and Kubeflow Session

Using Kubernetes and Kubeflow, David Aronchick shows how every company, no matter how technical, can use sophisticated machine learning (ML) solutions to transform their businesses while taking advantage of the reliability and portability that cloud native applications can provide.

Mehant Baid is a software engineer at Dropbox. For the past few years, he’s been working on Edgestore, the distributed data store that handles all of Dropbox’s metadata needs. Previously, he worked on the database kernel at Oracle, where he focused on scaling inserts into the database. He’s a committer and project management committee member with the Apache Software Foundation and worked with the open source community to develop Apache Drill—an SQL engine for Hadoop, NoSQL, and cloud storage. His primary interests are the fields of distributed systems and databases.

Presentations

Improving reliability of your distributed data store Session

Mehant Baid covers the challenges Dropbox faced while running Edgestore—a low-latency, distributed data store that serves 10 million requests per second. He shares the technical and cultural changes Dropbox adopted that enabled the company to consistently hit its service level objectives.

Ben Bleything is a developer and sysadmin from Seattle, Washington. He’s best known as one of the world’s leading experts in the emerging field of clown computering. In his spare time, Ben is a developer advocate at Google, where he’s focused on making the experience of operating software on Google Cloud as awesome as possible.

Presentations

Infrastructure and compliance testing with InSpec (sponsored by Google Cloud) Tutorial

Automated software testing has become a standard practice. You probably have a variety of test suites exercising every part of your application. Do you have the same thing for your infrastructure? What about your audit controls? Join Ben Bleything to learn how to use the open source InSpec framework to build infrastructure and compliance tests so you can focus on building the next thing.

Zack Bloom is the director of product for product strategy at Cloudflare. Previously, he was the cofounder of Eager (acquired by Cloudflare in 2016). Zack is the author of the JavaScript used in open source libraries that total more than 50,000 stars on GitHub, are included in Twitter Bootstrap, and are used on over a million websites.

Presentations

Isolate computing Session

The technology invented for web browsers is a much better way of running serverless code than traditional processes and containers. Let Zack Bloom show you why.

Tim Bonci is a lead operations engineer and an IT adrenaline junkie focused on working across teams to enable Vistaprint engineers to deliver software quickly and safely. Tim is passionate about staffing, problem solving, and systems-level thinking. His life before IT was as varied as biotech, broadband engineering, and wine, leading to a jack-of-all-trades approach.

Presentations

How I failed to build a runbook automation system and what I learned Session

You're going to automate all the things, reduce toil, and make your systems smarter and recover automatically. . .except sometimes you're automating a house of cards built on the back of individual people and a well-meaning solution can fail to address the true problems in the system. Tim Bonci offers a postmortem of a solution that was designed to solve a common operational problem but failed.

Nicolas Brousse manages and scales the Adobe Advertising Cloud Infrastructure. Previously senior director of operations engineering at TubeMogul and the company’s sixth employee, Nicolas has grown TubeMogul’s infrastructure over the past decade from several machines to a few thousands servers that handle hundreds of billions of requests per day for clients like Allstate, Chrysler, Heineken, and Hotels.com.

Adept at adapting quickly to ongoing business needs and constraints, Nicolas leads a global team of site reliability engineers and database architects that monitors Adobe Advertising Cloud infrastructure 24-7 and adheres to DevOps methodology. Nicolas is a frequent speaker at top US technology conferences and regularly gives advice to other operations engineers. Before relocating to the US, Nicolas worked in technology for over 15 years, managing heavy traffic and large user databases for companies like MultiMania, Lycos, and Kewego.

Presentations

Use of self-healing techniques and failure injections to build a reliable service at Adobe Session

Nicolas Brousse and Oleksii Mykhailov found a distributed infrastructure that leverages public cloud providers and a private cloud with open infrastructure can deliver dynamic advertising content with low latency, preserving its high availability in an award-winning paper. Join them as they present their techniques and demonstrate how to design an ad-serving service that is resilient to failure.

Alex Chen is the senior director of product management at Alibaba Cloud, responsible for storage products and function compute. Previously, he spent over 19 years at IBM, where he was the worldwide business executive for IBM Enterprise Storage XIV, IBM File Storage offerings, and IBM Software Defined Storage offerings among other roles. He also led various business development efforts for IBM, including the acquisition of Texas Memory System in 2012, and divestiture of System X to Lenovo in 2014. He holds six US patents from his early career as a storage software development engineer, and he worked on enterprise block storage, tape systems, and storage management software development.

Presentations

Cloud native storage behind the biggest 1-day shopping event in the world Session

On November 11, 2018, more than $30.5B of goods from over 180 thousand brands were sold in one day through one platform, with peak message requests of 1.72B per second. Alex Chen details Alibaba's cloud storage infrastructure that supports this level of data velocity, variety, and volume.

Amy Chen is a systems software engineer at VMware through the Heptio acquisition. She’s passionate about Kubernetes, Go, containers, and distributed systems. In her free time, she also runs a YouTube channel, Amy Codes, that discusses software engineering and distributed systems topics.

Presentations

Sharing is caring: Your Kubernetes cluster, namespaces, and you Session

As the number of teams, clusters, and namespaces grows within an organization, it becomes increasingly difficult to maintain any kind of coherence. Amy Chen discusses how, by aligning identity, resource limits, and your application’s security posture, cluster operators can get more organizational mileage out of Kubernetes namespaces.

Guilin Chen is a software engineer at Facebook, where he works on mobile performance and leads the Instagram efficiency team.

Presentations

Serving billions of Instagram users efficiently Session

Instagram is widely used across the world and serves billions of users every day. Guilin Chen and Shobhit Kanaujia pull back the curtain on how Facebook operates Instagram efficiently at scale.

Uma Chingunde leads the compute organization at Stripe, a team within the company’s larger infrastructure organization that supports all Stripe engineers and customers by providing a reliable and scalable platform. Uma started her career in the distributed systems space, working on key virtual machine technology at VMware as an engineer and then manager. Previously, she was an engineering manager at Delphix, supporting the core teams building the Delphix virtual appliance. She has a master’s degree in computer science from Johns Hopkins University.

Presentations

Navigating the midcareer plateau Session

As both engineers and managers reach midcareer levels referred to as career or terminal levels (e.g., senior engineer or senior manager levels in many technology companies), they are often faced with uncertainty and ambiguity on possible next steps in their career. Uma Chingunde focuses on career planning and strategy for midcareer technologists.

Ian Coldwater is a DevSecOps engineer turned red teamer who specializes in breaking and hardening Kubernetes, containers, and cloud native infrastructure. In their spare time, they like to go on cross-country road trips, capture flags, and eat a lot of pie. Ian lives in Minneapolis and tweets as @IanColdwater.

Presentations

Crafty requests: Deep dive into a Kubernetes CVE Session

You may have heard about CVE-2018-1002105, one of the most severe Kubernetes security vulnerabilities of all time. But how does this flaw work? How can it be exploited, and what does it all mean? Ian Coldwater takes a deep dive into the exploit to explain the risks and gives you practical advice about how to protect your clusters.

Jennifer Davis is a Senior Cloud Advocate at Microsoft. Previously, she was a principal site reliability engineer at RealSelf and developed cookbooks to simplify building and managing infrastructure at Chef. Jennifer is the coauthor of Effective DevOps and speaks about DevOps, tech culture, and monitoring. She also gives tutorials on a variety of technical topics.

Presentations

The Ops in the Serverless Session

Examining the increased need for specialized Operations Engineering in the Age of Serverless

Sebastien Deleersnyder is a cofounder and managing partner of Toreon, providing professional ICT security services to customers in Belgium and abroad. As security project leader and information security officer, he’s built up extensive experience in information security-related disciplines, both at strategic and tactical levels. He specializes in application security, combining his software development and information security experience. He’s performed several successful secure development lifecycle projects in the financial and utility sectors, started up software security groups, supported customers in selecting and implementing web application firewalls (WAF), delivered web application security training, and closed a lot of audit findings regarding application security. Sebastien started the Belgian Open Web Application Security Project (OWASP) as chapter leader, was a member of the OWASP foundation board, and performed several public presentations on web applications and web services security. He also co-organized the yearly security and hacker BruCON conference and trainings in Belgium.

Presentations

Hands-on threat modeling and tooling for DevSecOps 2-Day Training

Sebastien Deleersnyder teaches you how to use threat modeling to integrate security in the DevOps workflow, introduces threat modeling as code, and shows you how to build a security culture in your organization.

Kevin Dunne is the GM of Tricentis Flood, ensuring his team’s continued commitment to innovation and delivering tools for creating software that scales. With a deep interest in the emerging trends in software development and testing, Kevin is dedicated to collaborating with thought leaders in this space. Previously, Kevin was at Deloitte, where he managed testing on large government and Fortune 500 engagements delivering enterprise resource planning (ERP) implementations and custom software development. As one of the first employees at QASymphony, Kevin saw many facets of the business, working in sales, customer support, marketing, and product management. Kevin holds a BS from Vanderbilt University.

Presentations

Continuous load testing: A journey to performance at scale (sponsored by Tricentis) Session

Applications are subject to intense scrutiny over their performance with research showing more than one-third of users will leave an app and never return if performance is not satisfactory. Whether you're just getting stated with your journey or looking to take your performance testing to the next level, you won't want to miss Kevin Dunne highlighting all of the latest tips and tricks.

Alex Elman is a site reliability engineer at Indeed. He’s studied and practiced resilience engineering at Indeed for seven years with the goal of making failure within distributed systems a boring nonevent. Even after moving into a leadership role, Alex continues to carry a pager, believing that incident response is a valuable learning opportunity.

Presentations

Learning from failure: Why a total site outage can be a good thing Session

Alex Elman explains how Indeed used a site-wide outage as an opportunity to build resilience, improve reliability, and make lasting improvements to the engineering culture.

Lachlan Evenson is a principal program manager on the Azure Containers team at Microsoft. He has spent the last two and a half years working with Kubernetes and enabling cloud native journeys. Lachie serves as a cloud native ambassador and TOC contributor and has deep operational knowledge of many cloud native projects.

Presentations

Community projects inform enterprise products (sponsored by Microsoft Azure) Keynote

Lachlan Evenson and Bridget Kromhout discuss Microsoft's journey to build Kubernetes policy controller Gatekeeper in the open and explain how the tool helped inform how an enterprise offering on Azure. Join in for pragmatic tips on how to effectively contribute to and use open source tools.

Rustem Feyzkhanov is a machine learning engineer who creates analytical models for manufacturing industry at Instrumental. Rustem is passionate about serverless infrastructure (and AI deployments on it) and has ported several packages to AWS Lambda from TensorFlow, Keras, and scikit-learn for ML to PhantomJS, Selenium, and WRK for web scraping.

Presentations

Serverless architecture for data science Session

Machine and deep learning become more and more essential for a lot of businesses for internal and external use. One of the main issues with deployment is finding right way to operationalize model within the company. Serverless approach for deep learning provides cheap, simple, scalable and reliable architecture for it. My presentation will show you how to do so within popular AWS infrastructure.

Kat Fitzgerald is a principle security architect at Uber ATG taking on the challenge of IoT, cloud, and k8s security architectures and engineering. She has (many) years of experience in the security field, with an emphasis on security operations, incident response, and purple teams. Previously, she spent five years at Apple in Cupertino as senior security architect/engineer. Based in Pittsburgh and a natural creature of winter, you can typically find her sipping Casa Noble Anejo while simultaneously defending her systems using OSS, magic spells, and dancing flamingos against a barrage of attackers. Running IoT honeypots on k8s clusters running on Raspberry Pis has upped the ante on her security research toolbox.

Presentations

Intro to Kubernetes security; or, Taming the Great Spaghetti Monster Session

Kat Fitzgerald walks you through building and maintaining a secure Kubernetes environment.

Liz Fong-Jones is a developer advocate, labor and ethics organizer, and site reliability engineer (SRE) with 15+ years of experience at Honeycomb. Previously, she was an SRE working on products ranging from the Google Cloud Load Balancer to Google Flights. She lives in Brooklyn with her wife, metamours, and a Samoyed/Golden Retriever mix, and in San Francisco and Seattle with her other partners. She plays classical piano, leads an EVE Online alliance, and advocates for transgender rights as a board member of the National Center for Transgender Equality.

Presentations

Cultivating production excellence Keynote

Join Liz Fong-Jones to explore several practices core to production excellence: giving everyone a stake in production, collaborating to ensure observability, measuring with service level objectives, and prioritizing improvements using risk analysis.

Lorenzo Fontana is an open source software engineer at Sysdig, where he primarily works on Falco, a Cloud Native Computing Foundation (CNCF) project that does container runtime security and anomaly detection. He’s passionate about distributed systems, software-defined networking, the Linux kernel, and performance analysis. He’s the maintainer of the IO Visors Project’s kubectl-trace.

Presentations

eBPF-powered distributed Kubernetes performance analysis Session

The extended Berkeley Packet Filter (eBPF) ecosystem can be hard to wrap your mind around. Lorenzo Fontana is here to help you understand it while applying eBPF programs to nodes and resources of a Kubernetes cluster.

Dr. Nicole Forsgren is VP of Research & Strategy at GitHub. She is author of the Shingo Publication Award-winning book Accelerate: The Science of Lean Software and DevOps and is best known as lead investigator on the largest DevOps studies to date. She has been a successful entrepreneur (with an exit to Google), a professor, performance engineer, and a sysadmin. Her work has been published in several peer-reviewed journals.

Presentations

Why should I care about DevRel anyway? Session

Emily Freeman and Nicole Forsgren dive into what the heck DevRel is and why you (yes, you) are actually a DevRel too—whether you know it or not. You’ll walk away from this game show—talk, that is—with a smile on your face and a deeper understanding of the ins and outs of technical advocacy and how developer relations benefit you as an engineer.

Gustavo Franco is a customer reliability engineer at Google working on learning more about, helping to define, and expanding the reach of SRE. In his 12 years at Google, he’s started, moved, and managed several SRE teams including Google Plus Frontend, BreakFix, Horizon Web, Cluster Turnups, Apps Media, Apps Messaging, G Suite, and Cloud Identity.

Presentations

Scaling SRE organizations: The journey from 1 to many teams Session

A lot has been said about the SRE profession (how to start an SRE team, how to scale a single team in place, etc.), but how to move from a single SRE team to an SRE organization that requires several teams has been largely unexplored. Gustavo Franco takes new SRE leaders and individual contributors through what it takes to be a part of or start their second team and beyond.

After many years of ghostwriting, Emily Freeman made the bold (ridiculous?) choice to switch careers into software engineering. Emily is the author of DevOps for Dummies (April 2019) and the curator of JavaScript January. A former VP of developer relations, Emily is a cloud ops advocate at Microsoft and lives with her daughter in Denver, Colorado.

Presentations

Why should I care about DevRel anyway? Session

Emily Freeman and Nicole Forsgren dive into what the heck DevRel is and why you (yes, you) are actually a DevRel too—whether you know it or not. You’ll walk away from this game show—talk, that is—with a smile on your face and a deeper understanding of the ins and outs of technical advocacy and how developer relations benefit you as an engineer.

Karthik Gaekwad is a cloud native developer advocate at Oracle Cloud Infrastructure. A veteran engineer, he enjoys building software products from scratch using cloud and container technologies. Previously, he worked in both large enterprises and startups, including National Instruments, Mentor Graphics, Signal Sciences, and StackEngine (acquired by Oracle). He organizes several conferences including Devopsdays Austin and Container Days, and he’s an accomplished author for LinkedIn Learning. Karthik holds an MS in computer engineering from the University of Arizona. In his free time, he enjoys spending time with family in his hometown of Austin, Texas, dabbling in new product ideas, and blogging.

Presentations

Security in the FaaS lane (sponsored by Oracle Cloud Infrastructure) Session

Karthik Gaekwad walks you through security strategies and pitfalls viewed through a serverless lens. You'll leave with a solid understanding of how to approach security conversations about serverless applications in the enterprise.

Sébastien Goasguen built his first compute cluster while working on his PhD in the late ‘90s when they were still called Beowulf clusters; he’s been working on making computing a utility since then. He’s been focused on containers and container orchestration, creating a Kubernetes startup Skippbox where he created kompose, Cabin, and kubeless. Active in the serverless community, he cofounded TriggerMesh, a serverless management platform that builds on top of Kubernetes and Knative. He can be found hiking the Jura or at open source conferences. He’s the author of the Docker Cookbook and coauthor of the Kubernetes Cookbook.

Presentations

Certified Kubernetes Application Developer (CKAD) prep for exam 2-Day Training

Can you develop and maintain applications using Kubernetes? That’s the question more employers are asking these days. Take the next step in your career by becoming a Certified Kubernetes Application Developer at Velocity. You get a full day of test prep from Sébastien Goasguen, O’Reilly’s top Kubernetes trainer, and the opportunity to take the exam onsite.

Deploying serverless applications to any cloud with Knative Tutorial

Priyanka Sharma and Sébastien Goasguen walk you through deploying serverless functions to any cloud provider of choice, breaking the shackles of vendor lock in.

Chen Goldberg is a technology leader with 20 years of experience leading engineering teams. She’s the engineering director at Google Cloud, where she leads the Anthos engineering team, including Google Kubernetes Engine (GKE) and the OSS Kubernetes and Istio projects teams, helping development teams to increase their agility and modernize workloads. Her team is committed to enabling an open cloud and ensuring universal adoption of Kubernetes and Istio through ongoing community leadership and technical innovation with an emphasis on users’ needs. Chen lives in Sunnyvale, California, with her husband and three kids. Outside of work she enjoys hiking and making desserts.

Presentations

Scaling teams with technology (or is it the other way around?) Keynote

Microservices and cloud native technologies is the path for building large-scale, distributed systems. Can it do the same for teams? Chen Goldberg leads the Google engineering team building Kubernetes, Istio, GKE, and Anthos and explains how the same tech can help build happy teams.

Clare Gollnick is the director of data science at NS1, an industry-leading DNS and traffic management platform. An expert on statistical inference and machine learning, Clare writes and speaks often on the intersection of data, philosophy, and entrepreneurship. Previously, as chief technology officer of Terbium Labs, Clare led a diverse team of engineers and data scientists. Her team built innovate information security products, preventing fraud while still protecting consumer privacy. Clare has published a number of academic papers on information processing within neural networks, validation of new statistical methods, and the philosophy of science. Clare holds a PhD from Georgia Tech and a BS from UC Berkeley.

Presentations

Extracting signal: Fast traffic analysis and smart aggregation in global edge networks Session

Many platforms require a deep understanding of client traffic. But efficiently divining the desired signal from the continuous stream of traffic in at-scale edge networks is an enormous challenge. Shannon Weyrick and Clare Gollnick discuss strategy and technology for analysis and aggregation at the edge, plus centralized collection, all based on real-world use cases.

Lena Hall is a senior software engineer and developer advocate at Microsoft working on Azure, where she focuses on large-scale distributed systems and modern architectures. Lena has more than 10 years of experience in software engineering with a focus on distributed cloud programming, real-time system design, highly scalable and performant systems, big data analysis, data science, functional programming, and machine learning. Previously, she was a senior software engineer at Microsoft Research. She’s an elected member of the F# Software Foundation’s board of trustees, co-organizes a conference called ML4ALL, and is often an invited member of program committees for conferences like Kafka Summit, Lambda World, and others. Lena holds a master’s degree in computer science.

Presentations

Channel into the universe of eventually perfect distributed systems Keynote

Lena Hall takes you on an adventure into the multifaceted universe of ever-changing distributed systems.

Everett Harper is the CEO and cofounder of Truss, which builds software and infrastructure to help companies scale and enterprises and public agencies modernize their digital services. Notable projects include Healthcare.gov, Nuna, and DOD Transcom. Previously, Everett worked at innovative companies in tech, such as Linden Lab (maker of Second Life), and in social impact, including Self-Help Community Development Financial Institution (CDFI). Everett started his career at Bain & Company, a top strategy consulting firm. He’s a board member of CARE.org and CASE at Duke Fuqua School of Business. He’s written for Forbes, Fortune, and TechCrunch and has been a featured speaker at Dent, Techstars, Dreamforce, and Women 2.0. Everett holds an MBA and MEd from Stanford University and a BSEE in biomedical engineering from Duke University, where he was an AB Duke Scholar. He won the NCAA National Championship in soccer and was inducted into the North Carolina Soccer Hall of Fame in 2019. Everett lives in Oakland, California, making limoncello when life hands him lemons.

Presentations

Infrastructure first: Because solving complex problems needs more than technology Keynote

Drawing from work in technology, community development finance, social psychology, complexity theory, and championship sports, Everett Harper moves to the edge of these disciplines, centering on the key practices that are crucial for solving our most critical challenges.

Andrew Harvey is the CTO in residence in Sydney at Microsoft, where he helps startups of all sizes scale their products and technology. Previously, he was CTO at several startups, including the Apple Design Award-winning app Zova.

Presentations

Your team as a distributed system Session

Many technical leaders find themselves in leadership without any formal training. Andrew Harvey asks, What if you used your understanding of distributed systems to understand your team and how to scale it?

Steve Heffernan is the creator of Video.js, an open source web video player in use on over 1 million websites and with over 1 billion videos played per month. He’s currently leading product at Mux, building video performance monitoring and APIs for video streaming. Steve has been helping run the SF Video Tech Meetup and the Demuxed video engineering conference for the last five years.

Presentations

Build HQ Trivia (better than HQ) Session

Streaming live video at low latency with user interaction laid on top is hard. Steve Heffernan explains how to make it a lot easier with standards-based approaches and existing network technology.

James Heimbuck is the product manager for infrastructure at SendGrid, where he’s working with the company’s tech ops and internal tools teams to deliver platforms, products, and services to enable the delivery teams to create and deliver code for its 78,000 customers. Previously, he spent the last 14 years in B2B and B2C product management roles at companies such as Webroot, Bluprint (formerly Craftsy), and HomeAdvisor.

Presentations

Product management and DevOps, together at last and kicking butt Session

DevOps and platform teams have too many projects, not enough time, and users who can easily ask if the thing is done, because "it's really holding them up." James Heimbuck explores the good, the bad, and the ugly of how SendGrid incorporates product management practices into planning and execution within DevOps and platform teams to cut off scope creep and never-ending projects and realize value.

André Henry is a systems engineer at Venmo and a lifelong hacker. If you ever wondered who would try to install a supercomputer in an NYC apartment, you’ve thought about him. André lives at the intersection of lasers, cats, and tech. You can find him at a bookstore or conference, always learning and sharing.

Presentations

This should be faster; what's going on? Session

Software and applications run on a real physical network. Trying to deliver packets across the data center or around the world can pose unique challenges depending on the application and performance requirements because modern networks are a complicated mix of technologies. André Henry explores the sources of latency on the network and explains how to mitigate them.

Nivia S. Henry fundamentally believes that happy people, working in a healthy environment, will produce great outcomes. This is the philosophy behind her 15-plus-year career creating structures in which high-performing teams thrive. Today, Nivia plies her trade as an a manager of engineering managers at Spotify. Her career path has included nearly every role in tech, but her true passion is inspiring people to do their best work. Nivia has cochaired one of the largest tracks for Agile Alliance, organized meetups, and has spoken at conferences of all sizes. Her hobbies include being an overbearing mom to a gorgeous cat and traveling with her awesome husband, Andre. You can find her on Twitter and LinkedIn.

Presentations

Ghost in the machine: The unintended consequences of bias in machine learning Session

Machine learning bias comes from our lack of understanding our own biases. Nivia Henry puts that into focus and offers practical solutions to mitigate such biases.

Patrick Higgins is a UI developer at Gremlin, where he helps developers unleash the power of controlled chaos. He’s passionate about finding effective ways to make UIs resilient to failure. He fills his weekends with playing soccer and music, reading, and assisting with civic causes that he cares about.

Presentations

Chaos breeding confidence: Broader implications of chaos engineering Session

Chaos engineering provides a mechanism for us to discover vulnerabilities in our infrastructure and applications by proactively seeking it out. While this is a great starting point, Patrick Higgins shares how you can apply its practices to holistically enhance the resilience of your organizations and products.

Daniel Hochman is a senior infrastructure engineer at Lyft. He’s passionate about scaling innovative products and processes to improve quality of life for those inside and outside of the company. During his time at Lyft, he has successfully guided the platform through an explosion of product and organizational growth. He wrote one of the highest-throughput microservices and introduced several critical storage technologies. Daniel most recently guided the rollout out of Envoy-Redis at Lyft which included a full on-the-fly replacement of Lyft’s high-throughput caching infrastructure. Daniel leads traffic networking at Lyft and is responsible for designing and implementing Lyft’s frontend infrastructure to scale for increasing numbers of diverse clients.

Presentations

How Lyft migrated to a service mesh with Envoy Session

Lyft has made the transition from a single monolithic service to 300+ microservices by leveraging Lyft's open source proxy Envoy. Daniel Hochman and Jose Nino explain how Lyft migrated from a legacy monolithic application to over 300 microservices while keeping drivers, passengers, and developers happy.

Lorin Hochstein is a senior software engineer on the cloud operations and reliability engineering (CORE) team at Netflix, where he works on ensuring that Netflix remains available. Previously, he was the senior software engineer at SendGrid, lead architect for cloud services at Nimbis Services, computer scientist at the University of Southern California’s Information Sciences Institute, and assistant professor in the Department of Computer Science and Engineering at the University of Nebraska-Lincoln. Lorin holds a BEng in computer engineering from McGill University, an MS in electrical engineering from Boston University, and a PhD in computer science from the University of Maryland.

Presentations

Move fast and learn from incidents Tutorial

Ryan Kitchens, Lorin Hochstein, and Nora Jones discuss incident management and explore effective approaches and techniques that help you build the capacity to encounter failure and manage the consequences of failure successfully.

Jon Hodgson is the principal scientist for APM at Riverbed Technology. For over a decade, Jon has helped hundreds of organizations around the world optimize the reliability and performance of their mission-critical applications. With a background in data science, application architecture, systems administration, networking, and programming, Jon employs a multidisciplinary approach to troubleshooting, enabling him to analyze and solve some of the most challenging performance issues in complex modern environments. When he’s not obsessing about data visualization and making things perform faster, Jon enjoys digging things up with his tractor at his home in Missouri.

Presentations

25 billion transactions and counting: How Dell manages application performance at scale (sponsored by Riverbed) Session

The scale of cloud native environments can be overwhelming. How do you employ cutting-edge technology to ensure the best app performance for customers executing thousands or even billions of transactions a day? Jon Hodgson, Jeremy Tupa, and Marcelo Soares share practical advice you can apply to your own environment, whether large or small.

Thomas Jackson is the head of core and data infrastructure at Wish, where he’s responsible for online serving infrastructure (service discovery, container orchestration, databases, etc.). In his decade-plus of experience in information technology, he’s cultivated a specialty in high-performance, distributed, and reliable systems. Among his accomplishments, he was the founding member of the traffic SRE team at LinkedIn, where he built and scaled the company’s edge infrastructure.

Presentations

Katalog-sync: Reliable integration of Consul and Kubernetes (sponsored by Wish) Session

Consul is a well-known and widely used service discovery mechanism. Although Kubernetes has a built-in service discovery mechanism, Wish has standardized on using Consul. Thomas Jackson explains how Wish is leveraging Kubernetes and integrating it with its infrastructure.

Anil Jacob is a lead software engineer on the frontier scale team at Salesforce, where he works on large and complex customer implementations and related scale challenges. Previously, he was at Intuit, BEA WebLogic, and Wells Fargo. Anil’s interests are application scale, user experience, UX performance, and application development.

Presentations

Scale data access with app layer caching (sponsored by Salesforce) Session

Databases are costly but critical for application health. Protecting them allows applications to scale with demand, reducing both hardware dependency and costs. Anil Jacob explains how to use application layer caching to cushion shared resources when there are frequent requests for data that doesn't change often, enabling businesses to scale well, provide good user experience, and reduce costs.

Nora Jones practices chaos engineering and human factors at Slack and is a student of human factors and systems safety at Lund University. She’s passionate about resilient software, people, and the intersection of those two worlds. She cowrote the book on chaos engineering with her teammates while working at Netflix and keynoted at AWS re:Invent in 2017 to an audience of over 40,000 people about the technical benefits and business case behind implementing chaos engineering.

Presentations

Move fast and learn from incidents Tutorial

Ryan Kitchens, Lorin Hochstein, and Nora Jones discuss incident management and explore effective approaches and techniques that help you build the capacity to encounter failure and manage the consequences of failure successfully.

Maya Kaczorowski is a product manager in security and privacy at Google, focused on container security. Previously, she worked on encryption at rest and encryption key management. She was also an engagement manager at McKinsey & Company, working in IT security for large enterprises, and she completed her master’s in mathematics, focusing on cryptography and game theory. She’s bilingual in English and French. Outside of work, Maya is passionate about ice cream—making ice cream for friends at home, attending the Penn State Ice Cream Short Course in January 2014, and researching ice cream headaches. She also enjoys puzzling, running, and reading nonfiction.

Presentations

Containers can actually improve your security story Session

Maya Kaczorowski explores how containers offer a fundamentally different, possibly better, security model than you're used to. They enable you to patch your environment more easily, identify when you're affected by a new vulnerability, and enforce governance for what is deployed in your environment.

Shobhit Kanaujia is an engineer at Facebook who specializes in full stack performance and efficiency at scale.

Presentations

Serving billions of Instagram users efficiently Session

Instagram is widely used across the world and serves billions of users every day. Guilin Chen and Shobhit Kanaujia pull back the curtain on how Facebook operates Instagram efficiently at scale.

Michael Kehoe is a site reliability engineer at LinkedIn, where he specializes in building and maintaining reliable, scalable system infrastructure. Previously, he worked with networks at the University of Queensland, built small satellites at NASA, and wrote thermal environments software at Rio Tinto.

Presentations

Getting started with eBPF Tutorial

Michael Kehoe gets you up and running on the extended Berkeley Packet Filter (eBPF). Join in to learn what eBPF is, how it works, how to use it, and how to program against it with a set of labs and plenty of preread material to get you up to speed.

Jessica Kerr is a developer at Atomist, where she makes development automation. She also podcasts on Greater Than Code and Arrested DevOps, and she speaks at software conferences around the world. Currently, she’s into resilience engineering, domain-driven design, and the practice of symmathesy: collaboration through mutual learning between us and the software we make.

Presentations

From puzzles to products Keynote

Jess Kerr argues that most programming careers aren't about writing software; they're about changing it. With this distinction, she'll share some things about reuse, delivery, quality, and how to grow as a programmer.

Ryan Kitchens is a site reliability engineer on the CORE team at Netflix, where he works on building capacity across the organization to ensure its availability and reliability. Previously, Ryan was a founding member of the SRE team at Blizzard Entertainment.

Presentations

How did things go right: Learning more from incidents Session

Join Ryan Kitchens for an introduction to Safety-II concepts that will help move the industry forward, increasing the opportunity for learning from success with some fundamental and practical ways that get us from "Why did things go wrong?" to "How did things go right?"

Move fast and learn from incidents Tutorial

Ryan Kitchens, Lorin Hochstein, and Nora Jones discuss incident management and explore effective approaches and techniques that help you build the capacity to encounter failure and manage the consequences of failure successfully.

Bridget Kromhout is a principal cloud advocate at Microsoft. Her CS degree emphasis was in theory, but she now deals with the concrete (if the cloud can be considered tangible). After 15 years as an operations engineer, Bridget traded being on call for being on a plane. A frequent speaker and program committee member for tech conferences, she leads the Devopsdays organization globally and the DevOps community at home in Minneapolis, Minnesota. She podcasts with Arrested DevOps, blogs at Bridgetkromhout.com, and is active in a Twitterverse near you.

Presentations

Community projects inform enterprise products (sponsored by Microsoft Azure) Keynote

Lachlan Evenson and Bridget Kromhout discuss Microsoft's journey to build Kubernetes policy controller Gatekeeper in the open and explain how the tool helped inform how an enterprise offering on Azure. Join in for pragmatic tips on how to effectively contribute to and use open source tools.

Kubernetes for the Impatient Keynote

Everyone keeps telling you that containers need orchestration, but you're not so sure; maybe they could go for some light jazz? Or maybe serverless is here to save us from the tyranny of (virtual) machines, but meanwhile somebody's gotta kuber some netes, and it's likely to be you.

Will Larson leads Stripe’s infrastructure engineering team, which provides the reliable, performant, and usable platforms and tools for Stripe’s engineers and users. At Stripe, he’s had the opportunity to be part of their development of Veneur and Sorbet and migrations to Kubernetes, Envoy, and Bazel, as well as providing the infrastructure for the launch of exciting new products like Terminal and Issuing. Previously, Will supported engineering teams at Uber and Digg.

Presentations

How Stripe invests in technical infrastructure Session

Will Larson explores how Stripe has evolved its approach to prioritizing technical infrastructure as the company grew from two founders to millions of users and 1,300+ employees.

Yoni Leibowitz is a software engineer on the Azure Data Explorer (Kusto) team, which he joined back when the service was still in an early stage of incubation. While being very hands on and focusing on the platform’s data engine, Yoni maintains a customer-obsessed state of mind, works closely with customers at all scales, and supports the growing community of users of the service and the Kusto query language.

Presentations

The telemetry data revolution at Microsoft (sponsored by Microsoft) Session

Interested in becoming more data driven and empowering your peers and coworkers with insights and data? Yoni Leibowitz and Sasha Rosenbaum share how Microsoft has been constantly transforming its engineering, support, finance, and marketing work via new tech for data-driven decisions.

Jenny Liao is a software engineer in Google’s Pittsburgh office. A Carnegie Mellon alumni, she has a healthy dose of Pittsburgh pride. Jenny is passionate about distributed systems design and is always excited to connect with more people. She enjoys painting, singing, and playing with dogs in her free time.

Presentations

SRE classroom: How to design a reliable application in three hours Tutorial

Explore the key concepts behind large system design with Jenny Liao, as she guides you through building, scaling and provisioning a system. Apply the concepts you learn to evaluate and build systems of your own. You will be working in small groups.

Beth Adele Long abandoned a potential career as a rocket scientist to tinker with websites. She’s currently a DevOps solutions strategist for New Relic and the project lead for New Relic’s collaboration with the SNAFUcatchers industry consortium. She’s obsessed with joint cognitive systems and good pens.

Presentations

Having the bubble: How your experts build, maintain, and spread deep system knowledge Session

We fret about how to break system knowledge out of knowledge silos—the expert individuals with a deep intuitive understanding of our complex systems. Beth Long explains how those experts represent both a vulnerability and a strength and why understanding them as a key mechanism in your larger systems helps you harness their power and protect against fragility.

Laura Maguire studies human performance in high-risk, high-consequence work. As a researcher with the SNAFU Catchers Consortium, she has spent the last two years studying critical digital infrastructure and the teams tasked with keeping them running. She has a master’s degree in human factors and systems safety and is currently completing her PhD in cognitive systems engineering at the Ohio State University.

Presentations

Lowering costs of coordination during service outages: A multiple case analysis Session

DevOps squads coordinate in almost every aspect of their work. Laura Maguire explores how high-performing teams responding to service outages demonstrate sophisticated, nuanced practices that ease the cognitive burden of coping with complex, time-pressured incidents.

Charity Majors is the cofounder and CTO of Honeycomb, a startup that provides the first and (thus far) only observability solution for modern systems. She’s the coauthor of Database Reliability Engineering (the unicorn book) and previously worked at Parse, Facebook, and Linden Lab. She tests in prod.

Presentations

Test in production: Yes, you can (and you should) Session

Charity Majors explains why the only environment that matters is production. For the good of humanity, ditch the rest.

Jonathan Maltz is a software engineer at Nuna, working to make high-quality, affordable healthcare available to every American. Previously, he was an Android developer and has progressively moved back the stack to his role at Nuna, where he builds backend applications and infrastructure. Along the way, he’s cultivated a love for fostering healthy communication between teams and their external stakeholders.

Presentations

ZOMG I’m leading a project? Session

Leading a project requires reorienting your priorities in new and sometimes unintuitive ways; Jonathan Maltz explains how your responsibilities change when you start leading a project and how you can successfully adapt to that change.

Heather Martin joined the cloud integration and automation team at Discover in November of 2016. Since then, her passion for cloud computing has grown from optimizing Discover’s private cloud, driving adoption of infrastructure as code to supporting efforts to expand and integrate into AWS. Heather leads an infrastructure as code public meetup and has started an internal meetup within Discover to help the technology organization drive its 2020 technology strategy. Heather has worked in IT for more than 15 years, spanning various infrastructure services roles and most recently expanding into leadership for more than five of them. She has an undergraduate degree in computer science and is currently pursuing her MBA from Lake Forest College. When not sharing her passion for cloud computing, Heather enjoys running and is a site coordinator for Girls on the Run (GOTR) and race director of an annual anti-human trafficking 5K.

Presentations

Teaching old dogs new tricks: Infrastructure as a product Session

Ent infrastructure is unpredictable, and being agile means working harder and faster to complete project after project to keep the business moving forward. You do very little to improve the solutions you provide to your customers, and this model just doesn't scale. Heather Martin describes the journey of moving from a project to a product mind-set to transform how we deliver infrastructure.

Darren McCleary is a senior software engineer on the games team at the New York Times focused on building reliable distributed systems. He’s been a full-time Go developer since joining the company. He also sits on the Times’ Architecture Review Board, where he helps other teams design and build new systems.

Presentations

Clarity and confidence: Observability on the New York Times games team Session

The mantra with Git is "commit early, commit often." With deep insight into your applications, you can deploy early and deploy often. Darren McCleary explores how the New York Times games team fearlessly pushes changes to production and monitors its impact for 400,000+ crossword subscribers and how the team rapidly drills into issues when they occur.

Nikki McDonald is a content director at O’Reilly Media, where she writes, edits, and works with the industry’s leading practitioners to develop books, online courses, and training videos to help engineers and developers collaborate more effectively and create and deploy complex distributed systems. She also cochairs O’Reilly’s Velocity Conference, held annually in San Jose, New York, and London. Nikki started out as a features editor at MacUser magazine back when people were still dialing up to the internet with AOL. She lives in Ann Arbor, MI.

Presentations

Thursday opening welcome Keynote

Program chairs Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs Nikki McDonald, James Turnbull, and Ines Sombra open the first day of keynotes.

Patrick Meenan is a software engineer at Facebook, where he’s helping make the web faster. Patrick has been working on web performance in one form or another for the last 25 years. Previously, he worked at Cloudflare and Google to make Chrome and the web faster. Patrick created the popular open source WebPageTest web performance measurement tool, runs the free instance of it at WebPagetest.org, and can frequently be found in the forums helping site owners understand and improve their website performance.

Presentations

Fixing the performance of your (probably broken) HTTP/2 deployment Tutorial

Patrick Meenan lets you in on how HTTP/2 prioritization is effectively broken in most deployments and shows you how to detect, debug, and fix the issues.

Russ Miles is CEO of ChaosIQ.io, where he and his team build commercial and open source products and provide services to companies applying chaos engineering to build confidence in the resilience of their production systems. Russ is an international consultant, trainer, speaker, and author. His most recent book, Antifragile Software: Building Adaptable Software with Microservices, explores how to apply chaos engineering to construct and manage complex, distributed systems in production with confidence.

Presentations

Fast track to chaos engineering 2-Day Training

Build confidence in your systems' behavior and identify weaknesses before they happen. Join Russ Miles on a deep dive into chaos engineering and learn how to apply it in your organization.

Richard Moot (pronouns: he/him) is a developer evangelist at Square. He has a love-hate relationship with JavaScript. When he’s not programming, gaming, or reading, he’s probably just walking his dog.

Presentations

Build a backend with TypeScript using Nest.js (sponsored by Square) Session

TypeScript is overtaking the JavaScript world. Nest.js is a progressive Node.js framework for building efficient, reliable, and scalable server-side applications using TypeScript. It's modular, testable, and very similar in structure to Angular but built for backends. Richard Moot covers how Nest.js makes things better when building a TypeScript app.

Amanda Moran is a Bay Area-based developer advocate at DataStax. Her passion is bridging the gap between customers and engineering. Previously, she worked for HP, Lockheed Martin, Teradata, and Apache Trafodion startup Esgyn. Amanda’s an Apache Committer and member of the PMC for Apache Trafodion. She’s worked on customer POCs, executive demos, distributed database cloud deployments, Python coding, data science workshops, conferences, Linux/Hadoop administration, and scripting—a little bit of everything. She has a master’s degree in computer science from Santa Clara university and a BS in biology from the University of Washington. In her spare time, she loves running, hanging out with her dog, and finding reasons to go to Disneyland.

Presentations

Data modeling in the 24th and a half century with Apache Cassandra Session

The future is here and the future needs more than your basic relational databases. Amanda Moran explains Apache Cassandra data modeling, how to do it right, and how you can be successful with cloud native distributed databases by avoiding common mistakes.

Oleksii Mykhailov is a senior SRE at Adobe and has been a key contributor to Adobe Advertising Cloud, which handles over 350 billions requests a day. Oleksii built the foundation of that large-scale infrastructure during hypergrowth and while driving key reliability initiatives.

Presentations

Use of self-healing techniques and failure injections to build a reliable service at Adobe Session

Nicolas Brousse and Oleksii Mykhailov found a distributed infrastructure that leverages public cloud providers and a private cloud with open infrastructure can deliver dynamic advertising content with low latency, preserving its high availability in an award-winning paper. Join them as they present their techniques and demonstrate how to design an ad-serving service that is resilient to failure.

Ryan Neal is head of infrastructure and part of the founding team at Netlify. Previously, he worked on the infrastructure team at Yelp and at Palantir in the Middle East. Ryan is based in San Francisco. He loves big data, fire spinning, and his golden retriever.

Presentations

Processing metrics with Golang and AWS Lambda Session

AWS Lambda and Golang are really powerful but difficult to use. Netlify uses both to process over 20 million events every hour. Ryan Neal shares a template project that lets you deploy a function and discusses gotchas he encountered while running it in production.

Jose Nino is the lead for core server networking group at Lyft. Jose has been instrumental in creating systems to scale Lyft’s Envoy production environment for increasingly large deployments and engineering orgs. He’s worked as an open source Envoy maintainer and has nurtured Envoy’s growing community. Recently, Jose has moved on to scaling Lyft’s network load tolerance systems. Jose has spoken about Envoy and other related topics at several venues, most recently at KubeCon EU 2018 and at the main stage at KubeCon NA 2018.

Presentations

How Lyft migrated to a service mesh with Envoy Session

Lyft has made the transition from a single monolithic service to 300+ microservices by leveraging Lyft's open source proxy Envoy. Daniel Hochman and Jose Nino explain how Lyft migrated from a legacy monolithic application to over 300 microservices while keeping drivers, passengers, and developers happy.

Kris Nova is independent, focusing on containers, infrastructure, and Kubernetes, and she’s an ambassador for the Cloud Native Computing Foundation. Previously, she was a developer advocate and an engineer on Kubernetes at Heptio. Kris has a deep technical background in the Go programming language and has authored many successful open source tools in Go. She’s a Kubernetes maintainer and the creator of kubicorn, a successful Kubernetes infrastructure management tool. Kris organizes a special interest group in Kubernetes and is a leader in the community. She understands the grievances with running cloud native infrastructure via a distributed cloud native application and recently authored an O’Reilly book on the topic, Cloud Native Infrastructure. Kris lives in Seattle and spends her free time climbing mountains.

Presentations

Building resilience with Kubernetes Session

Details to come.

Renee Orser is the vice president of engineering at NS1, where she oversees all delivery and operations of NS1’s engineering organization. Renee brings deep expertise in facilitation, cross-functional communication, and brash problem solving to NS1’s teams. Previously, Renee spent a decade working and traveling in over 30 countries while managing teams delivering distributed, highly scalable digital healthcare products to governments and international nonprofits; her roles included senior program manager at ThoughtWorks, analyst at Partners In Health, and independent consultant. She holds a BA in international relations and Arabic from Tufts University.

Presentations

Two hearts, one mind: Delivering the same platform to SaaS and on-premises Session

NS1 first developed and operated a SaaS DNS platform, then shifted to releasing versioned software for on-prem use of its DNS products. Renee Orser shares the many lessons learned, including the relationship between system architecture and organizational design, while the team managed the challenges brought by diversification of a single platform across a suite of deployment models.

Jérôme Petazzoni is a DevOps advocate and international speaker. He was born and raised in France, where he worked on geographic information systems, voice over IP, video streaming, and encoding and started a cloud hosting company back when EC2 wasn’t an Amazon product yet. In California he built and scaled the dotCloud PaaS, which eventually gave birth to Docker. While at Docker, he represented the company at hundreds of conferences and events and trained thousands of engineers to use Docker, Swarm, and Kubernetes. He’s fluent in many languages (mostly programming ones), owns a dozen musical instruments, and can play the theme of Zelda on most of them.

Presentations

Kubernetes for administrators and operators 2-Day Training

Kubernetes has the reputation of being hard to set up and operate. Many cloud and service providers make it easier by offering managed clusters, but even then, maintaining and exploiting Kubernetes day to day require specialized skills. Jérôme Petazzoni uses hands-on exercises to teach you how to set up, operate, and maintain production Kubernetes clusters.

Neil Peterson is a senior content engineer at Microsoft, where he delivers technical documentation and samples with a focus on Azure and containers. A data center and cloud enthusiast, Neil has 15 years’ experience in large data center deployment, management, and maintenance operations.

Presentations

Cloud native infrastructure deployments with Terraform Tutorial

As we move toward hosting workloads on cloud-based resources such as virtual machines, storage, and container-based solutions, it's also important to modernize how these resources are deployed and managed. Neil Peterson walks you through methods for deploying cloud infrastructure with Terraform, including through a modern CI/CD pipeline.

Will Pressly is senior director of emerging business at Verizon Digital Media Services. Will has worked on the company’s engineering team in several capacities over the past 7+ years and brings with him a variety of technical experience.

Presentations

FaaS hackathon tutorial (sponsored by Verizon Digital Media Services) Tutorial

For those interested in developing more performant applications at the edge of the network, William Pressly provides an overview of Verizon Digital Media Services’ function as a service (FaaS) platform, which enables developers to run code at the network edge for richer, more personalized user experiences at ultralow latency. You'll then participate in an FaaS hackathon using the platform.

Alex Qin is out here trying get free. She cares about helping people fulfill the radical visions they have for themselves and the world while working towards more just and equitable futures. She has experience writing code, leading engineering teams, growing communities, developing education programs, teaching, and advocating for underrepresented people in tech. These days, she spends her time learning, healing, giving love, and growing the Code Cooperative, a community of people who use and build technology to create life-changing possibilities for individuals and communities impacted by incarceration.

Presentations

How do we heal? Keynote

We've been working to foster a more diverse, inclusive, and equitable tech industry for years, but we have yet to see meaningful and lasting change. Drawing inspiration from restorative justice practices and her own journey of healing, Alex Qin offers a hopeful vision for how we can come together and cocreate the world we yearn for.

Bob Quillin is vice president of developer relations at Oracle Cloud Infrastructure (OCI), where he leads OCI developer relations, advocacy, engagement, and lighthouse adoption. Bob has focused his career on applying automation to simplify complex cloud, IT, and developer challenges. He joined Oracle as part of its 2015 acquisition of StackEngine, an early container-native pioneer building services and platforms designed to help developers and DevOps teams build, orchestrate, and scale enterprise-grade container apps, where he was cofounder and CEO. Previously, Bob was CEO of Austin-based cloud monitoring SaaS startup CopperEgg (acquired by IDERA in 2013) and held executive and startup leadership roles at nLayers, EMC, and VMware.

Presentations

The cloud native elephant in the room (sponsored by Oracle Cloud Infrastructure) Keynote

While cloud native appears to be on a winning streak, there are too many enterprise development teams being left behind. Bob Quillin outlines how the cloud native community can create a more open multicloud future, reduce complexity (rather than piling more on), and be more inclusive to all teams—modern and traditional, startups and enterprises alike.

Rajesh Raman is a software engineer at SingalFx, where he works on the SignalFlow streaming analytics engine, developing novel ways to handle real-time, distributed computation of streaming time series data. Rajesh has over 15 years’ experience working in large-scale distributed systems capacity management and service monitoring at companies such as Facebook and Google. He holds a PhD in computer science from the University of Wisconsin, where his focus (and thesis) centered on distributed resource management for high throughput computing.

Presentations

Kick-starting a culture of observability and data-driven DevOps (sponsored by SignalFx) Session

Rajesh Raman dives deep into the practice of observability, demonstrating how a more analytics-driven approach to metrics, traces, and other monitoring signals improves observability. You'll learn a framework for kick-starting a culture of observability in your organization, informed by Rajesh's experience building and deploying observability tools at SignalFx.

Shraya Ramani is a Los Angeles-based security infrastructure software engineer at BuzzFeed with extensive experience with Golang. Previously, she worked at MongoDB as a software engineer on the server tools team. When she isn’t obsessing about tests, she enjoys learning about the ocean and octopuses.

Presentations

Securing services using SSO Session

As BuzzFeed transitioned to microservices, it needed to secure a growing number of internal tools. BuzzFeed's first solution was an open source auth service deployed in front of each app, but this approach had a number of scaling issues. Shraya Ramani discusses SSO, BuzzFeed's open source, homegrown, centralized solution which elegantly solved this problem.

Alex Rasmussen is a data engineering consultant. Previously, he was VP of engineering at Freenome, an AI genomics company, and an early employee at Trifacta, a pioneer in the data wrangling space. He holds a PhD from the University of California San Diego (UCSD), where his dissertation focused on highly efficient large-scale data processing systems. While at UCSD, he led the TritonSort project, which set several world records in large-scale sorting.

Presentations

Schema evolution patterns Session

Alex Rasmussen explores different categorical solutions to the problem of schema evolution—what happens when the structure of your structured data or API payload changes—and the trade-offs associated with each solution.

Isobel Redelmeier works on open source software at LightStep, where she focuses on OpenTracing and other observability solutions to improve performance management across distributed systems. She learned firsthand how difficult, and how valuable, observability can be when working at Pivotal, where she pushed code in about 10 languages to different production systems while working with Pivotal Labs. She later focused on security in Cloud Foundry.

Presentations

Ariadne's thread through the labyrinth: Using observability to tame a rogue code base Tutorial

Every week, seven brave SWEs and seven brave SREs get sacrificed to the Minotaur: the legendary latency leech lurking somewhere in the labyrinthine depths. You've been tasked with rescuing your comrades. But even a hero as brave as you cannot possibly survive the maze without some help. Isobel Redelmeier shows you how to leverage observability to tackle distributed system problems.

Liz Rice is the technology evangelist at container security specialists Aqua Security and coauthor of the O’Reilly report Kubernetes Security. She has a wealth of software development, team, and product management experience from her years spent working on network protocols and distributed systems and in digital technology sectors such as video on demand (VOD), music, and voice over internet protocol (VoIP). When not building startups and writing code, Liz loves riding bikes in places with better weather than her native London or racing in virtual reality on Zwift.

Presentations

Lessons from hacking Kubernetes with kube-hunter Session

Kube-hunter is an open source penetration testing tool for checking the security of your Kubernetes cluster. Liz Rice explores how kube-hunter finds problems and misconfigurations and shares lessons on securing your cluster learned as a result.

Jeremy Rickard is a software engineer on the Azure container compute team working remotely in Colorado. He works on a number of open source projects, including Virtual Kubelet, Open Service Broker for Azure, service catalog, and Cloud Native Application Bundle tooling. Previously, he worked at VMware and helped build infrastructure and services that support VMware Cloud Services and built services using Spring, Cloud Foundry, and Kubernetes.

Presentations

Pack your bags: Managing distributed applications with CNAB Tutorial

Jeremy Rickard and Carolyn Van Slyck introduce the Cloud Native Application Bundle (CNAB) specification and teach you how to author bundles using Porter to deploy complicated modern applications with load balancer creation, certificate management, application deployment, and persistent storage.

Aaron Rinehart has been expanding the possibilities of chaos engineering in its application to other safety-critical portions of the domain, notably cybersecurity. He pioneered the application of security in chaos engineering during his tenure as the chief security architect at UnitedHealth Group (UHG), the largest private healthcare company in the world. While at UHG, Aaron released ChaoSlingr, one of the first open source tools focused on using chaos engineering in cybersecurity to build more resilient systems. A frequent author, consultant, and speaker, Aaron also recently founded the first chaos engineering meetup in Washington, DC.

Presentations

Security precognition: A look at chaos engineering in security incident response Session

Chaos engineering allows security incident response teams to proactively experiment on recurring incident patterns to derive new information about underlying factors that were previously unknown. Join Aaron Rinehart to explore the hidden costs of security incidents, learn a new technique for uncovering system weaknesses in systems security, and more.

Sasha Rosenbaum is a program manager on the Azure DevOps engineering team, focused on improving the alignment of the product with open source software. She’s a co-organizer of the DevOpsDays Chicago conference and recently published a book on serverless computing in Azure with .NET.

Presentations

The telemetry data revolution at Microsoft (sponsored by Microsoft) Session

Interested in becoming more data driven and empowering your peers and coworkers with insights and data? Yoni Leibowitz and Sasha Rosenbaum share how Microsoft has been constantly transforming its engineering, support, finance, and marketing work via new tech for data-driven decisions.

Christian Saide is a DevOps engineer at NS1, where he has been a key player in automating, hardening, and scaling out its systems, particularly by pushing more and more of its infrastructure into container-based architectures and implementing solutions to the tough problems surrounding global distribution. He also served a critical role in NS1’s move to software-defined networking and authored the primary software-defined networking device and network topology. Christian has been working in the technology sector for five years, focusing on networking and distributed systems. Previously, he was at Industrial Color Software, where he climbed from a midlevel software developer to director of development operations and was instrumental in taking the company’s aging infrastructure from a handful of bare-metal servers to multiple virtualization hosts running hundreds of virtual machines, which in turn supported hundreds of containers.

Presentations

DDoS mitigation made easy with XDP and eBPF Tutorial

Christian Saide shows you how to defend your infrastructure against costly DDoS attacks by blacklisting or white-listing traffic, load-shedding, and analyzing traffic using XDP and eBPF.

Osman Sarood leads the infrastructure team at Mist Systems, where he helps Mist scale the Mist Cloud in a cost-effective and reliable manner. Osman has published more than 20 research papers in highly rated journals, conferences, and workshops and has presented his research at several academic conferences. He has over 400 citations along with an i10-index and h-index of 12. Previously, he was a software engineer at Yelp, where he prototyped, architected, and implemented several key production systems and architected and authored Yelp’s autoscaled spot infrastructure, fleet_miser. Osman holds a PhD in high-performance computing from the University of Illinois Urbana-Champaign, where he focused on load balancing and fault tolerance.

Presentations

How embracing unreliability can make infrastructure reliable and cost-effective Session

Server faults are a reality. While public cloud vendors try to improve hardware reliability, software should play its part by being resilient to server failures. Mist consumes TBs of telemetry data daily to do machine learning. Osman Sarood explains how the company is running 80% of its production infrastructure, reliably, on AWS Spot Instances while keeping annual costs to $2 million.

Aaron is a developer advocate at Microsoft Azure and a core maintainer of the Athens Project. Before Athens, he was a core maintainer and chair of the Kubernetes SIG-Service-Catalog and a contributor to various other projects in the Kubernetes community.

He has 15+ years of software engineering experience ranging from frontend design to distributed data systems. He discovered Go around 2013 and Kubernetes in 2015 and hasn’t looked back. He lives in Portland, OR where he and his wife love to run up and down mountains together.

Presentations

Kubernetes is still hard for app developers; let’s fix that Session

Aaron Schlesinger dives into case studies on why and how it's hard for app developers to adopt Kubernetes. He walks you through the tools to make the transition easier, providing a holistic view of how to fit everything together to make Kubernetes easier for teams. You'll leave with what you need to get your teams started with or improve your team’s productivity on Kubernetes.

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Presentations

Monitor disk space and other ways to keep Apache Kafka happy Session

After five years of helping hundreds of customers use Apache Kafka, you've seen it all. Gwen Shapira provides an overview of the most common ways Apache Kafka users manage to cause downtime and lose data. And how to avoid them.

Priyanka Sharma is the director of Cloud Native Alliances at GitLab, the only cloud-agnostic single application for the entire DevOps lifecycle. Priyanka has worked on several Cloud Native Computing Foundation (CNCF) projects, with her deepest expertise being the OpenTracing standard. She’s worked on Jaeger tracing and Vitess and given talks on Kubernetes, Prometheus, Envoy, and the Secure Production Identity Framework for Everyone (SPIFFE)/SPIRE. A former entrepreneur with a passion for building developer products and growing them through open source communities, Priyanka advises startups at HeavyBit Industries, an accelerator for developer products. She holds a BA in political science from Stanford University.

Presentations

Deploying serverless applications to any cloud with Knative Tutorial

Priyanka Sharma and Sébastien Goasguen walk you through deploying serverless functions to any cloud provider of choice, breaking the shackles of vendor lock in.

Marcelo Soares is a senior IT manager at Dell leading the application performance management organization globally. With 21 years of IT industry experience across multiple IT verticals, covering a wide range of disciplines from IT operations and support to testing and quality and also leading development delivery organizations.

Presentations

25 billion transactions and counting: How Dell manages application performance at scale (sponsored by Riverbed) Session

The scale of cloud native environments can be overwhelming. How do you employ cutting-edge technology to ensure the best app performance for customers executing thousands or even billions of transactions a day? Jon Hodgson, Jeremy Tupa, and Marcelo Soares share practical advice you can apply to your own environment, whether large or small.

Ines Sombra is director of engineering at Fastly, where she spends her time helping the web go faster. Ines holds an MS in computology with an emphasis on cheesy ’80s rock ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life, she was a data engineer.

Presentations

Thursday opening welcome Keynote

Program chairs Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs Nikki McDonald, James Turnbull, and Ines Sombra open the first day of keynotes.

Dimitri Stiliadis is the cofounder and CTO of Aporeto. He comes from a multidisciplinary background in distributed systems, security, and networking and is the inventor of several groundbreaking technologies in these areas. Previously, he was the cofounder and CTO of Nuage Networks, where he led the development of the industry-leading Virtualized Services Platform. He’s held several leading roles in Bell Labs Research, where he led a series of research programs with fundamental contributions in networking, algorithms, optical networks, and distributed systems.

Presentations

Identity is the new security perimeter (sponsored by Aporeto) Session

Application delivery now spans a range of technologies and deployment models: virtual machines (VMs), containers, serverless functions. Protecting these environments efficiently and minimizing errors is challenging unless your security is based on application identity and verifiable policies. Dimitri Stiliadis explains how to achieve robust security with end-to-end auth n, auth z, and encryption.

Jeremy Tupa is an application performance management consultant at Dell. His 20-year career has spanned application development and architecture, service operations, and performance management. Over the years, Jeremy has lent his expertise to a variety of apps across the Dell portfolio, including Dell.com, mobility, services, manufacturing, corporate finance, and HR. When he’s not wearing his cape as a performance superhero for Dell, Jeremy can be found out in the wild camping with his son or playing soccer or video games.

Presentations

25 billion transactions and counting: How Dell manages application performance at scale (sponsored by Riverbed) Session

The scale of cloud native environments can be overwhelming. How do you employ cutting-edge technology to ensure the best app performance for customers executing thousands or even billions of transactions a day? Jon Hodgson, Jeremy Tupa, and Marcelo Soares share practical advice you can apply to your own environment, whether large or small.

James Turnbull is VPE at Glitch. A longtime member of the open source community, James is the author of a number of books about open source software. Previously, he was a CTO in residence at Microsoft, founder and chief technology officer at Empatico and Kickstarter, VPE of Venmo, and an adviser at Docker. James likes food, wine, books, photography, and cats. He is not overly keen on long walks on the beach or holding hands.

Presentations

Thursday opening welcome Keynote

Program chairs Nikki McDonald, Ines Sombra, and James Turnbull open the second day of keynotes.

Wednesday opening welcome Keynote

Program chairs Nikki McDonald, James Turnbull, and Ines Sombra open the first day of keynotes.

Carolyn Van Slyck is a software developer based in the wilds of suburban Chicago, working remotely on the Microsoft cloud native team. Her passion is developer tools and building vibrant inclusive open source communities around them. She’s a maintainer for the Cloud Native Application Bundle (CNAB) Spec, Duffle, and Porter, Kubernetes service catalog, and the GoMods Athens Proxy. Carolyn runs Women Who Go and organizes for the Chicago chapters of Women Who Go and Write/Speak/Code. In between code reviews, Carolyn hauls her cookies around the world to share her love of open source, containers, and excessive emoji.

Presentations

Pack your bags: Managing distributed applications with CNAB Tutorial

Jeremy Rickard and Carolyn Van Slyck introduce the Cloud Native Application Bundle (CNAB) specification and teach you how to author bundles using Porter to deploy complicated modern applications with load balancer creation, certificate management, application deployment, and persistent storage.

Seth Vargo is an engineer at Google Cloud. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.

Presentations

Base64 is not encryption: A better story for Kubernetes secrets Tutorial

By default all Kubernetes secrets are base64 encoded and stored as plaintext in etcd. Seth Vargo shares techniques for securing Kubernetes secrets, including encryption, KMS plug-ins, and tools like HashiCorp Vault and the trade-offs of each approach to better secure their clusters.

John Voorhees is the founder and CEO of Primitive, where he and his team are working to transform the practice of software development through the power of data visualization in virtual reality (VR). VR has the potential to revolutionize software development by offering a new medium that fundamentally changes the way that teams interact with code. The company’s goal is to create a development environment that immerses a team in interactive 3D representations of complex software, allowing for an unprecedented look at code. John is a computer-aided design (CAD) engineer by training. As a self-taught programmer, John understands the challenge of learning to code. One of the main reasons he started Primitive was from a desire to create a programming interface that was similar to the kinds of tools he had used in CAD.

Presentations

Immersive development Session

Almost all science fiction representations of the future involve some version of an immersive, holographic interface with technology. The aspiration at the heart of this vision is that working with technology will one day become an extension of our existing visual, tactile understanding. John Voorhees discusses using virtual reality to connect developers in a spatial representation of code.

Heidi Waterhouse is a developer advocate with LaunchDarkly. She delights in working at the intersection of usability, risk reduction, and cutting-edge technology. One of her favorite hobbies is talking to developers about things they already knew but had never thought of that way before. She sews all her conference dresses so that she’s sure there is a pocket for the mic.

Presentations

Everything is a little bit broken; or, The illusion of control Session

Heidi Waterhouse explains how to handle uncertainty by adding in error budgets, layered access, and other accommodations for failure and for designing your systems for function over form or purity.

Shannon Weyrick is vice president of architecture at NS1. A 20-year veteran of internet infrastructure, Shannon is an accomplished technical architect, developer, and leader whose experience encompasses both development and operations of globally distributed platforms. Previously, Shannon worked at INAP and F5. A regular open source contributor, he has led and worked on a wide range of infrastructure projects from high-performance servers to novel programming languages and runtimes, and he enjoys writing and speaking at industry conferences.

Presentations

Extracting signal: Fast traffic analysis and smart aggregation in global edge networks Session

Many platforms require a deep understanding of client traffic. But efficiently divining the desired signal from the continuous stream of traffic in at-scale edge networks is an enormous challenge. Shannon Weyrick and Clare Gollnick discuss strategy and technology for analysis and aggregation at the edge, plus centralized collection, all based on real-world use cases.

Richard Whitehead is chief evangelist at Moogsoft, a pioneer and leading provider of AIOps solutions that help IT teams work faster and smarter and is responsible for the successful introduction of products and technologies to the market. Previously, Richard was a consultant at JRW Strategies, where he provided product strategy, new market introduction, and competitive analysis advice to the software industry. Richard is the cochair of the monitoring and analytics working group at ONUG.

Presentations

Overcoming tomorrow's operational challenges with AIOps (sponsored by Moogsoft) Session

Artificial intelligence for IT operations (AIOps) breaks the traditional, bottom-up, rules-driven approach to incident management. Using AIOps, you can improve customer service, lower operational costs, and boost productivity. Richard Whitehead explores how AIOps is capable of delivering continuous service assurance.

Phillip Wittrock is staff software engineer at Google, a member of the Kubernetes Steering Committee, and a Kubernetes SIG CLI technical lead. Phillip’s hobbies include debating how kubectl is pronounced and talking about Kubernetes at social events.

Presentations

Kubernetes APIs under the hood Session

Kubernetes provides a powerful set of APIs and abstractions for building distributed systems, and it provides users with the ability to build and install their own extension APIs alongside the core APIs. Phillip Wittrock covers how core Kubernetes APIs are designed and built and teaches the basics of writing an installable custom Kubernetes API.

April C. Wright is a hacker, O’Reilly author, teacher, and community leader who has been breaking, making, fixing, and defending the security of global critical communications and connections for over 25 years. She’s an international speaker and trainer, educating and advising on matters of privacy and information security with the goal of safeguarding the digital components we rely on every day. April has held roles on defensive, operational, adversarial, and development teams throughout her career, and she has spoken and contributed to numerous worldwide security conferences including BlackHat, DEF CON USA and DEF CON China, DerbyCon, Hack in Paris, DefCamp Romania, and ITWeb South Africa as well as for the US government and industry organizations such as OWASP and ISSA. She has started multiple small businesses, including a nonprofit, handles communications for DEF CON Groups community outreach, and in 2017 cofounded a popular local Boston community event (DC617). April has collected dozens of certifications to add capital letters at the end of her name and almost died in Dracula’s secret staircase. She once read in the Onion that researchers at the University of North Carolina released a comprehensive report in 2014 confirming her status as the “most significant and interesting person currently inhabiting the earth,” and it was on “teh internet,” so it must be true.

Presentations

Deepfakes: If anything can be real, then nothing is real Session

April Wright explores the possible ramifications of deepfakes, from privacy violations to personal and professional embarrassment to causing global thermonuclear war, and considers what can be done to protect ourselves—emphasizing the need for remaining critical of what we see as this technology gets better and better.

Ruth Yakubu is a senior cloud developer advocate at Microsoft and founder of PoshBeauty.com. She specializes in Java, cloud, advanced analytics, data platforms, and AI. She’s a tech speaker at conferences like Devoxx, JavaWithBest, DeveloperWeek, TechSummit, TechInProto, and LambdaWorld as well as events for developer communities. She’s worked for great companies like Unisys, Accenture, and DIRECTV over the years, gaining a lot of experience with software architectural design and programming. She was awarded the Dzone Most Valued Blogger.

Presentations

Building serverless solutions that are resilient, scalable, and cost effective Session

Ruth Yakubu explores end-to-end serverless scenarios on Microsoft Azure Functions, Azure Cosmos DB, and Event Grid.

Christine Yen is the cofounder of Honeycomb, a startup with a new approach to observability and debugging systems with data. Christine has built systems and products at companies large and small and likes to have her fingers in as many pies as possible. Previously, she built Parse’s analytics product (and leveraged Facebook’s data systems to expand it) and wrote software at a few now-defunct startups.

Presentations

Observability for developers: How to get from here to there Session

Observability may be the hot new thing, but for many devs, it's unclear how to gracefully get from where they are now (searching across logs or using canned APM tools) to debugging production with ease. Christine Yen makes the case that observability can be more valuable to devs than ops, and she lays out a series of practical steps to up-level a team's ability to ask questions of production.