Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Speakers

Hear from a wide range of talented people who are doing amazing things in web operations, performance, DevOps, and systems engineering. New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Craig Adams is vice president of web performance product management at Akamai, where he has global responsibility for the strategy and management of Akamai’s web experience solutions portfolio, which enables Akamai customers to deliver innovative, instant web experiences on any device, anywhere. Craig has a diverse background that spans over 15 years at Akamai, with leadership roles in sales, services and support, technical services delivery, and country management. Craig is passionate about creating and launching new programs and offerings that enable Akamai customers to successfully meet their business objectives while also contributing to Akamai’s growth and profitability. Under Craig’s leadership, Akamai adopted the use of the NetPromoter Score to monitor and increase customer satisfaction. Craig also launched Akamai University for Customers, which has certifed over 3,000 customers on Akamai solutions, the Customer Experience program, and the Customer Champion Awards, driving a customer-first focus at Akamai.

Presentations

Automating content delivery in a DevOps world (sponsored by Akamai) Keynote

As the industry moves to distributed systems and a DevOps model, companies must adopt DevOps in order to automate CI/CD workflows and increase deployment velocity. Craig Adams explores the traditional DevOps pipeline, addresses how to think about CDN automation, and explains how Akamai is baking automation into its CDN.

Tom Adams is a hands-on technical principal at ThoughtWorks, a software company and a community of passionate individuals who are seeking to revolutionize the IT industry. He believes the hardest problem for a developer is fighting the urge to solve a different, more interesting problem than the one at hand. Tom is interested in helping teams deliver awesome software products, sharing his knowledge, and all the tools (especially Java).

Presentations

Creating pipelines to build, test, and deploy containerized artifacts Tutorial

Containerization has launched a new wave of software deployment models, but do our philosophies for building, testing, and deploying software still hold true? Tom Adams walks you through creating a build pipeline for Docker images that is rooted in continuous integration (CI) practices.

Neeraj Agrawal is an engineering manager on Quora’s infrastructure team, where he leads company-wide efforts to make Quora fast and responsive. He started as an engineer working on server-side performance and has since revamped how the company thinks about performance. In his role as the speed lead, he has built the speed team from the ground up and created a culture of “performance matters” at Quora. Previously, he worked on other parts of the product, including writing the first version of Quora for iPhone and building a distributed, real-time ranking service for the personalized news feed product.

Presentations

Scaling the machine learning systems powering Quora's home feed Session

Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. Nikhil Garg and Neeraj Agrawal describe the evolution of the home feed's architecture and share several lessons from building and scaling this system.

Kevin Beck is a senior software engineer at New Relic working on the team responsible for data ingest and all things Kafka. Previously, Kevin spent 15 years at IBM and Informix, working on distributed relational databases.

Presentations

Thriving under a continuous self-inflicted DDoS attack Session

New Relic customers send monitoring data to New Relic servers every minute—a continuous firehose of data. Drawing on his experience at New Relic, Kevin Beck shares best practices for building a streaming service based on Apache Kafka, self-monitoring for reliability and fault tolerance, and building a DevOps culture that anticipates and prevents outages.

Kristopher Beevers is founder and CEO of NS1, the next-gen DNS and traffic management company. Previously, Kristopher led platform development at Voxel.net (acquired by Internap), where he built cloud and bare metal platforms, content delivery networks, and other distributed infrastructure products. Kristopher holds BS, MS, and PhD degrees in computer science from RPI.

Presentations

Fly the airplane (sponsored by NS1) Keynote

During active operational incidents, we experience very human reactions that get in the way of resolution. Approaches like Incident Command provide solid foundations for incident response. Kristopher Beevers explains how to augment Incident Command with simple tools and processes that help your team focus, communicate effectively, and respond calmly and precisely during mission-critical events.

From prototype to mission critical: The evolution of edge architecture at a global DNS service provider Session

Kristopher Beevers discusses the evolution of the edge delivery architecture of major DNS service provider NS1, from its earliest prototypes to the large, heavily automated global network it operates today, and the many operational lessons learned along the way.

Meet the Experts with Kristopher Beevers Meet The Experts

Join Kristopher to discuss topics like DNS resiliency, traffic management setups, application delivery architectures, global infrastructure management, and service provider redundancy.

David Belson is senior director of internet research and analysis at Oracle+Dyn, where he drives thought-leadership efforts supported by the company’s unique portfolio of internet intelligence tools and associated data. Previously, David spent 18 years at Akamai Technologies, where he was most recently senior director of industry and data intelligence, responsible for data-driven evangelism, including Akamai’s quarterly State of the Internet report series, and for leading competitive intelligence and analysis efforts, and worked at GTE Internetworking/BBN, where he supported sales of one of the first enterprise-class managed web hosting services and worked on an early prototype of what would currently be considered an SaaS platform. He holds a master of technical and professional writing from Northeastern University and a BS in computer science and a BA in science, technology, and society, both from Stevens Institute of Technology in Hoboken, NJ.

Presentations

Fixing the internet everyday: Internet volatility, the vigilance that fixes it, and why it matters (sponsored by Oracle+Dyn) Session

Although we often think of “breaking the internet” in the context of a website that couldn't handle the traffic associated with a piece of viral media content, behind the scenes, critical pieces of internet infrastructure break on a regular basis. David Belson dives into some of these issues and explains how you can avoid being impacted by them.

Andrew Betts is principal developer advocate at Fastly. Andrew’s area of expertise is emerging web technologies, particularly on mobile and tablet platforms. Previously, he was a PHP and JavaScript developer, web standards advocate, and founder of FT Labs, an emerging web technologies division of the Financial Times, where he and his team created the FT web app, one of the best examples of what can be achieved with HTML5.

Presentations

The Vary header and the future of cache variation at the edge Session

Most people working with CDN caches know about the Vary header, but few properly understand what it really does. With the advent of the Key header, new patterns for varying cache content will soon emerge. Andrew Betts shares common and advanced use cases for Vary, such as language, A/B testing, compression, and service worker support, and outlines potential changes to consider when Key arrives.

Blake Bisset got his first legal tech job at 16. He won’t say how long ago, except that he’s legitimately entitled to make shakey fists while shouting, “Get off my LAN!” He’s cofounded three startups—a joint venture with Dupont/ConAgra, a biotech spinoff from UW, and one that started this time a bunch of kids were sitting around on New Year’s Eve, wondering why they couldn’t watch movies on the internet—only to end up spending a half-decade as an SRM at YouTube and Chrome, where his happiest accomplishment was holding the go/bestpostmortem link for several years.

Presentations

Persistent SRE anti-patterns: Pitfalls on the road to creating a successful SRE program like Netflix and Google Session

People aren't just wrong on the internet. Sometimes they bring it back to the office. Blake Bisset and Jonah Horowitz share stories about anti-patterns in monitoring, incident response, configuration management, and more and explain how Google and Netflix view the role of the SRE (and how it differs from the traditional system administrator role).

Shawn Bower is the cloud architect for Cornell University, where he has helped to move many of the university’s workloads to Docker and the cloud. When not coding, Shawn is sleeping.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Shawn Bower, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Meet the Experts with Bret Fisher and Shawn Bower Meet The Experts

Want to know how to improve your local development workflow, how to deploy Docker Swarm, or how to deal with other stumbling blocks on the way to deploying containers? Come chat with Bret and Shawn about all things Docker and containers. No question is too big or too small.

VM Brasseur (aka Vicky) is a manager of technical people, projects, processes, products, and businesses. In her nearly 20 years in the tech industry, Vicky has been an analyst, programmer, product manager, software engineering manager, technical and C-level business consultant, and director of software engineering. She is a winner of the 2014 Perl White Camel Award and a winner of the 2016 O’Reilly Open Source Award. Vicky occasionally blogs, often writes, and frequently tweets at @vmbrasseur. She is also a community moderator for Opensource.com

Presentations

Find your way: Orienteering for managers Session

Are you managing distributed teams with very different stakeholders—perhaps even a mix of hobbyists and paid staff? It probably seemed easy at first, but the further you travel, the more unfamiliar the terrain appears. Luckily, this is not new ground. Many have gotten lost here before and found their way out again. VM Brasseur and Deb Nicholson share a map to productive, happy teams.

Meet the Experts with VM Brasseur and Deb Nicholson Meet The Experts

Deb and Vicky will answer all your questions about team composition, conflict resolution, communication strategies, goal and expectation setting, or anything else dealing with creating a healthy cross-functional team.

Joseph Breuer is a senior software engineer on Netflix’s playback licensing team.

Presentations

Event sourcing on a global scale: Netflix downloads Session

The Netflix download feature allows users to download content for offline playback. Implementing this feature required a new persistence architecture to maintain the state of user devices and content licenses. Joseph Breuer and Robert Reta explore the technical decisions behind the choice of a Cassandra event sourcing data store.

Yevgeniy “Jim” Brikman is the cofounder of Gruntwork, a company that uses Terraform to create infrastructure packages to get customers up and running on AWS in under two weeks; the company also provides Terraform training. Previously, Yevgeniy was a software engineer at LinkedIn, TripAdvisor, Cisco Systems, and Thomson Financial. He loves programming, writing, speaking, traveling, and lifting heavy things. He is the author of A Comprehensive Guide to Terraform, a series of informative blog posts published on Gruntwork’s blog, and the O’Reilly books Terraform: Up & Running and Hello, Startup: A Programmer’s Guide to Building Products, Technologies, and Teams. Yevgeniy holds a BS and a master’s degree, both from Cornell University.

Presentations

Infrastructure as code with Terraform 2-Day Training

Terraform is a key tool for managing infrastructure as code across a variety of platforms, including AWS, Google Cloud, and Azure. Yevgeniy Brikman gets you up and running with Terraform using real-world examples. You'll learn how to deploy servers, DBs, and load balancers on AWS, build immutable infrastructure with Docker and Packer, and put it all together in a continuous delivery pipeline.

Brendan Burns is a director of engineering at Microsoft Azure, where he runs the container service and resource manager teams, and a cofounder of the Kubernetes open source project. Previously, he worked at Google on cloud APIs and web search infrastructure and was a professor of computer science at Union College. Brendan holds a PhD in computer science from the University of Massachusetts Amherst and a BA in computer science and studio art from Williams College.

Presentations

Distributed systems patterns: From design to reality Session

Formal patterns for distributed systems make it significantly easier to design and deploy reliable, scalable distributed systems. Brendan Burns explains how to transform these patterns into containers and a custom Kubernetes API, which you can use to simply instantiate a distributed system via declarative API.

Meet the Experts with Brendan Burns Meet The Experts

Brendan is here to talk with you about Microsoft Azure, the cloud and cloud-native applications, containers, Kubernetes, and distributed systems.

Pieter Buteneers is a data strategist and machine learning consultant at CoScale. Pieter is relatively new to the DevOps field. Previously, he was a postdoc at Ghent University, where he did research on AI, machine learning, and deep learning. Pieter has also given a TEDx talk (on a somewhat unrelated subject).

Presentations

A hands-on data science crash course for modeling and predicting the behavior of (large) distributed systems Tutorial

Data science is a hot topic. Bart De Vylder and Pieter Buteneers offer a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.

Robert Castley is senior performance engineer for EMEA at Catchpoint Systems. A web performance specialist, web developer, and all-around good-looking geek, Robert has been optimizing web page performance for years. He specializes in APIs, CSS, JavaScript, HTML5, XML, web design and UX, and performance.

Presentations

How real is real-user measurement? (sponsored by Catchpoint) Keynote

Some quarters of our market believe that real-user measurement (RUM) is the end-all and be-all of customer experience management. But with the advancement of ad blockers, RUM tags are often getting blocked. Robert Castley explores the relevancy of real-user data if real users are blocking RUM tags and shares some solutions.

Meet the Experts with Robert Castley Meet The Experts

Robert has been optimizing web page performance for years and specializes in all things digital experience management, end-user experience, real-user measurement, and synthetic monitoring. Come chat with him all about APIs, CSS, JavaScript, HTML5, XML, web design and UX, and performance.

Jack Chan is a senior engineering manager in Shutterfly’s Photos Group. He was recently heavily involved with helping the company with a hybrid cloud migration solution with Photos-related API services on AWS, paired with a set of core services in a private data center. Jack has been working in software engineering development closely partnered with operations for quite some time, helping startups scale up to millions of users with cloud solutions. Previously, he worked in IT organizations at Adobe, Apple, and 3Com.

Presentations

How Shutterfly migrated 10+ billion photos to the cloud Session

Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos' metadata on AWS.

Vinu Charanya is a senior software engineer at Twitter, where she is building Twitter’s internal cloud infrastructure management platform. Vinu is also a core team member of Women Who Code, a nonprofit organization dedicated to inspiring women to excel in technology careers. She is also a part-time teacher of iOS and Android development and a mentor at CodePath. She holds a master’s degree in computer science and engineering from the University at Buffalo, where she worked in the PhoneLab Testbed research group under Geoff Challen and Steve Ko.

Presentations

How Twitter built a framework to improve infrastructure utilization and efficiency at scale Session

Twitter is powered by thousands of microservices running on an internal cloud platform, which offer compute, storage, messaging, monitoring, etc. as a service. Vinu Charanya explains how she and her team are building a system that captures, defines, provisions, meters, and charges infrastructure resources, redefining how systems are built atop Twitter infrastructure.

Mark Cho is a software engineer at Pinterest.

Presentations

Genji: A framework for building resilient near-real-time data pipelines Session

Pinterest has to support real-time decision making while operating on petabyte-scale data. Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest's real-time data pipeline (modeled on quasi-Kappa architecture), its impact on the company's systems, and tools and processes used and demonstrate how Pinterest models real-time ads analytics on the platform.

Rob Claire is an engineer on the visibility team at Pinterest, where he focuses on extracting insight from real-time operational data. Rob has more than 17 years of experience in the fields of data engineering, DevOps, and performance tuning. His career has included stints at One King’s Lane, Slide, Ning, and eBay.

Presentations

Drinking from the fire hose: Building a massive-scale monitoring stack Session

Rob Claire explores the the technical challenges and lessons learned in building a monitoring stack that can reliably process millions of events per second, covering specific technologies—including Spark Streaming, Kafka, and HBase—and best practices for managing and monitoring data.

Richard Cook is a research scientist in the Department of Integrated Systems Engineering at the Ohio State University in Columbus, Ohio, and emeritus professor of healthcare systems safety at Sweden’s KTH. A physician, researcher, and educator, Richard is an internationally recognized expert on safety, accidents, and human performance at the sharp end of complex, adaptive systems. His most often cited publication is “Going Solid: A Model of System Dynamics and Consequences for Patient Safety.”

Presentations

Above the line, below the line: A preview of the SNAFUcatchers Stella Report Keynote

David Woods and Richard Cook offer a glimpse at the SNAFUcatchers Stella Report.

Meet the Experts with Richard Cook Meet The Experts

Come chat with Richard about his experience with managing anomalies, his concept of "dark debt," and the paradoxes that dog efforts to enhance resilience.

Matt Cutts is acting administrator and distinguished engineer at the US Digital Service, which delivers better government services through technology and design. Previously, Matt worked at Google, where he wrote the first version of SafeSearch, Google’s family filter, and led the webspam team, where he protected the quality of Google’s search results and answered questions about search engine optimization and ranking algorithms.

Presentations

Government is a system. Keynote

In government, you can still find out-of-date tech practices like writing requirements for years or launching systems without monitoring. The government wants more effective technology. Meanwhile, everyone else wants a more effective government. Matt Cutts discusses how better technology can improve not just software systems but also trust in government itself.

Meet the Experts with Matt Cutts Meet The Experts

Join Matt Cutts for Meet the Experts. Matt is acting administrator and distinguished engineer at the US Digital Service, which delivers better government services through technology and design.

What happens when technology and government mix? Session

When the Healthcare.gov website failed, it was a turning point and an opportunity. In the last few years, hundreds of engineers, designers, and product managers have signed up to do tours of service in government. Matt Cutts explores what happens when technology and government mix. A lot of interesting things, it turns out.

Arshan Dabirsiaghi is chief scientist at Contrast Security, where he draws on experience to guide the product line, drive new products and features, and spread the gospel about binary instrumentation. Arshan is an accomplished security researcher with over 10 years of experience advising large organizations on application security. Previously, Arshan held a research role at Aspect Security, where he used static and dynamic technology to perform security assurance work, including code reviews, architecture reviews, and penetration testing. Arshan quickly discovered that securing applications was a massive undertaking—one that requires innovative, deeply accurate technology and continuous testing.

Presentations

Struts 2, Equifax, and you: The story of the worst breach in history (sponsored by Contrast Security) Session

Arshan Dabirsiaghi explains what Contrast Security learned from the Struts 2 exploit and details how to stop the next attack against your production apps.

Dino Dai Zovi is CTO at Capsule8. An established researcher and innovator in the cybersecurity community with over a decade of experience in red teaming, penetration testing, software security, information security management, and mobile security R&D, Dino is best known for winning the first PWN2OWN contest at CanSecWest 2007. Previously, he served as the mobile security lead at Square, building out the platform that ensures Square’s sellers’ mobile devices are safe. He has also held security leadership roles with Endgame, Two Sigma Investments, and Matasano Security. Dino is a member of the Black Hat Review Board and a regular speaker at information security conferences around the world, including DEF CON, Black Hat, and CanSecWest. He is a coauthor of The iOS Hacker’s Handbook, The Mac Hacker’s Handbook, and The Art of Software Security Testing.

Presentations

The role of open source in a company Keynote

Jessica Frazelle and Dino Dai Zovi—technologists who have spent their careers managing the balancing act between community and commercial perspectives—discuss how to be effective at open source in your company.

Bart De Vylder is a data scientist at CoScale. Previously, Bart was active in software engineering and architecture, with a focus on distributed systems. His interests lie in machine learning and building reliable, scalable data processing systems. Bart holds a PhD in artificial intelligence from the Free University of Brussels.

Presentations

A hands-on data science crash course for modeling and predicting the behavior of (large) distributed systems Tutorial

Data science is a hot topic. Bart De Vylder and Pieter Buteneers offer a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.

Kate Deutscher is technical team lead at GreenSync, where she builds systems to make renewable energy sources reliable. Previously, she worked on a delivery engineering team at realestate.com.au, where she focused on building automation tooling that integrates with tech like Docker, AWS, and Kubernetes to make building, packaging, and deploying applications easy, and spent the first six years of her career as a developer, which gives her keen insight into delivery issues and a passion for writing the clean, well-tested, and maintainable code that is often missing from automation tooling. Kate is determined to make a difference by introducing more women into the tech community; she organizes free training events like Rails Girls and DevOps Girls in her hometown of Melbourne, Australia.

Presentations

Automation run rampant Session

Kate Deutscher explores common pitfalls to automating software delivery and explains how to find the processes in your delivery pipeline that can benefit the most from automation, focusing on three patterns commonly seen in automation tooling, backed by real-world case studies of when this pattern has worked well—and when it has ended in rampant failure.

Frederik is a software engineer at Fastly. He has been a developer for nearly 20 years, mostly in C and a bit of Go. He has worked on on telecom systems for most of his carrier: from a fast carrier grade SMTP/DNS/SMS server to a satellite oriented TCP/HTTP proxy. He now leads the HTTP/2 efforts at Fastly, the real-time CDN.

Presentations

HTTP/2 development: A hands-on approach Tutorial

HTTP/2 (or H2, as the cool kids call it) has been ratified for months, and browsers already support it. But do the exciting features that HTTP/2 offer meet expectations? Frederik Deweerdt walks you through how HTTP/2 fares in the real world, how browser behavior is changing to accommodate new server-side functionality, and how you can get the most of the new protocol everybody’s talking about.

Rob Dickinson is a systems architect and software engineer on Intel’s NVML development team as well as the technical lead for pmemkv, a key-value datastore optimized for persistent memory. Previously, Rob led development of end-user monitoring products at Quest Software and Dell Software but won’t admit to how many startups it took to get there. Rob has never met a programming language he didn’t like. He lives with his wife, three kids, and snowboard collection in Boulder, Colorado.

Presentations

Four things I wish I'd known sooner about persistent memory Session

On the surface, adapting software to use persistent memory seems obvious. After all, persistent memory is simply fast memory that maintains state when the power goes out, like an SSD. But unlike SSDs, persistent memory challenges long-held ideas and conventions about how software works. Rob Dickinson outlines four key ideas that will help focus your persistent memory strategy.

Kellan Elliott-McCrea is the senior vice president at Blink Health, where he runs engineering, product, and design. Kellan spends a lot of time experimenting with how to get the best results from cross-functional teams building software. Previously, he took 18 months off to be a stay-at-home dad, taught engineering leadership seminars, was CTO at Etsy, and spent five years at Flickr doing what needed to be done. He’d like to apologize for coauthoring OAuth.

Presentations

You're not an architect, and this is not a bridge we're building: Leading technical decision making for high-performing teams Session

Kellan Elliott-McCrea explains how to lead technical decision making for high-performing teams.

Bret Fisher is a Virginia Beach-based freelance DevOps and Docker consultant, trainer, speaker, and open source volunteer. Bret has been a cloud and data center ops and system administrator for 20 years. Currently, he helps teams Dockerize their apps and systems and improve their speed of deployment, resiliency, metrics, and awareness (all that DevOps-y stuff). Bret is a Docker Captain and Code for America Brigade Captain. He runs several monthly meetups, speaks at conferences, and is obsessed with containerizing any app he sees. (He’ll likely talk your ear off about it next time you meet.) Bret also develops in Node.js, Bash, and general web, usually for open source projects. In his free time, he does CrossFit, surfs a little, geeks out in the awesome local dev community in Virginia Beach, and travels with his wife.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Shawn Bower, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Meet the Experts with Bret Fisher and Shawn Bower Meet The Experts

Want to know how to improve your local development workflow, how to deploy Docker Swarm, or how to deal with other stumbling blocks on the way to deploying containers? Come chat with Bret and Shawn about all things Docker and containers. No question is too big or too small.

Andrew Fong is an engineering director at Dropbox, where he is responsible for the Development Infrastructure Platform. In the past, he was responsible for storage, telemetry, and production systems. Previously, he was an SRE at YouTube and AOL.

Presentations

Best practices in data migrations Session

In 2016, Dropbox migrated 600 petabytes of data from managed cloud storage into its own data centers. Andrew Fong shares lessons and best practices for data migrations learned from this experience.

Jessica Frazelle is a software engineer at Microsoft, where she works with Linux and containers. Jess loves all things involving Linux namespaces and cgroups and is probably most well known for running desktop applications in containers. Jessica has been a maintainer of Docker and a contributor to RunC, Kubernetes, Linux, and Golang, among other projects and maintained the AppArmor, seccomp, and SELinux bits in Docker. She is quite familiar with locking down containers.

Presentations

Meet the Experts with Jessica Frazelle Meet The Experts

Bring all of your questions (and thoughts) on containers, security, Linux, or Golang to Jess to get the answers you need.

The role of open source in a company Keynote

Jessica Frazelle and Dino Dai Zovi—technologists who have spent their careers managing the balancing act between community and commercial perspectives—discuss how to be effective at open source in your company.

Nikhil Garg is an engineering manager at Quora, where he has led the quality, ads, and ML platform teams, among others. He is interested in the intersection of machine learning, distributed systems, and human psychology.

Presentations

Scaling the machine learning systems powering Quora's home feed Session

Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. Nikhil Garg and Neeraj Agrawal describe the evolution of the home feed's architecture and share several lessons from building and scaling this system.

Felix Glaser is a production engineer at Shopify, where he works on networking and security-related applications. Previously, Felix ran and sold his own startup, room.me, which matched roommates. In his free time, he organizes and plays CTFs for fun.

Presentations

Automated bot squashing: How to build your own bot fighting infrastructure Session

During flash sales, when milliseconds matter, bots buy faster than humans. These bots created a constant load on Shopify’s infrastructure and SREs—until the company decided to create an automated system to detect and block nearly all bot traffic on its load balancers. Felix Glaser offers an overview of this system and shares the challenges Shopify faced differentiating between bots and humans.

Sebastien Goasguen is senior director of cloud technologies at Bitnami, where he leads all the Kubernetes efforts. Sebastien joined Bitnami through the acquisition of his startup Skippbox. Sebastien is a 20-year open source veteran. A member of the Apache Software Foundation, he worked on Apache CloudStack and Libcloud for several years before diving into the container world. He is an avid blogger and enjoys spreading the word about new cutting-edge technologies. He also trains developers and sysadmins on all things Docker and Kubernetes. Sebastien is the author of the O’Reilly Docker Cookbook and 60 Recipes for Apache CloudStack.

Presentations

Kubernetes training 2-Day Training

Kubernetes, one of the highest velocity projects on GitHub, is quickly becoming the leading platform on which to build distributed applications. Sebastien Goasguen offers a Kubernetes primer, covering the architecture of a Kubernetes installation, the API objects that make up a distributed application on Kubernetes, and more.

Joe Goldberg is the lead solutions marketing manager at BMC Software, where he helps BMC products leverage new technology to deliver market-leading solutions with a focus on workload automation and big data. Joe has more than 35 years of experience in the design, development, implementation, sales, and marketing of enterprise solutions to Global 2000 organizations.

Presentations

You scream for microservices orchestration; I scream for batch; we all scream for jobs as code (sponsored by BMC Software) Keynote

Business transformation has led us to adopt new technologies and process and cultural changes. How batch application automation is built, tested, and run must evolve to keep pace. Joe Goldberg explores jobs as code, which looks at batch application automation from an SDLC perspective—an approach that embeds expectations within a modern automation platform.

Dina Goldshtein is a senior software engineer at Riverbed, which she joined through the acquisition of Aternity, a company that builds performance monitoring tools that run on millions of PCs and mobile devices. Dina is on the team responsible for the core collection mechanism, which hooks low-level Windows events and collects performance information from a variety of sources, where she works on boot performance monitoring, identifying bottlenecks in the Windows boot process, and monitoring user experience on the web. Previously, Dina worked at BrightSource Energy, where she lead the software foundations team, developing software infrastructure used by the entire R&D department. She was also responsible for bringing in new technologies, mentoring, and improving quality and development processes department-wide.

Presentations

ETW: Monitor anything, anytime, anywhere Session

Event Tracing for Windows (ETW) is the most important diagnostic tool Windows developers have at their disposal. Dina Goldshtein explores the rich and wonderful world of ETW events, which span numerous OS components. You’ll learn how to diagnose complex issues in production systems and discover ways to automate ETW collection and analysis to build self-diagnosing applications.

Sasha Goldshtein is the CTO of Sela Group, a Microsoft C# MVP and Azure MRS, a Pluralsight author, and an international consultant and trainer. Sasha’s consulting work revolves mainly around distributed architecture, production debugging, and mobile application development. Sasha is the author of Introducing Windows 7 for Developers (Microsoft Press) and Pro .NET Performance (Apress). He is also a prolific blogger and the author of numerous training courses, including .NET Debugging, .NET Performance, Android Application Development, and Modern C++.

Presentations

Fast and safe production monitoring of JVM applications with BPF magic Session

Sasha Goldshtein explores a holistic set of BPF-based tools for monitoring JVM applications on Linux and outlines a systems performance checklist that includes classics like fileslower, opensnoop, and strace—all based on the noninvasive, fast, and safe BPF technology.

Linux performance monitoring with BPF Tutorial

Sasha Goldshtein leads a hands-on workshop on Linux dynamic tracing. You'll explore the BPF Compiler Collection (BCC), a set of tools and libraries for dynamic tracing, and gain firsthand experience of memory leak analysis, generic function tracing, kernel tracepoints, static tracepoints in user-space programs, and the baked-in tools for file I/O, network, and CPU analysis.

Margaret Gourlay is a senior QA engineer at VictorOps. Margaret’s background is in video game quality assurance. She holds a degree in theoretical math. When she’s not working, she enjoys playing games with friends far and wide, learning new things, and playing outside with her kids.

Presentations

Lessons in interpersonal dynamics from massively multiplayer online games Session

In 2005, a World of Warcraft bug helped epidemiological research in unexpected ways. Margaret Gourlay draws on this research to share insight into what works and what doesn’t for functional teams and explains how using these ideas has helped VictorOps strategically grow its engineering team in unexpected ways.

Michael Hausenblas is a developer advocate for OpenShift and Kubernetes at Red Hat, where he helps app ops engineers build and operate distributed services. Michael shares his experience with distributed systems and large-scale data processing through demos, blog posts, and public speaking engagements and contributes to open source software such as OpenShift and Kubernetes. Previously, Michael was a developer advocate at Mesosphere, chief data engineer at MapR Technologies, and a research fellow at the National University of Ireland, Galway, where he researched large-scale data integration and the internet of things and gained experience in advocacy and standardization (World Wide Web Consortium, IETF).

Presentations

Let’s Go! Using the Go programming language for system tasks 2-Day Training

Go has established itself as a popular language for systems programming, services, and tools, and more people are using Go for tasks that would traditionally have been solved using Python or Ruby. Michael Hausenblas teaches you Go from scratch and walks you through how to use it for system tasks such as batch file operations, container inspection, and access control automation.

Meet the Experts with Michael Hausenblas Meet The Experts

Michael is a Gopher. If you know what that means, you'll likely want to talk with him. He's also into all sorts of cloud-native topics, including containers (CRI-O, Docker, etc.), Kubernetes, OpenShift, Prometheus, and functions as a service (from Amazon Lambda to OpenWhisk).

Kelsey Hightower has worn every hat possible throughout his career in tech but most enjoys leadership roles focused on making things happen and shipping software. Kelsey is a strong open source advocate focused on building simple tools that make people smile. When he is not slinging Go code, you can catch him giving technical workshops covering everything from programming and system administration to his favorite Linux distro of the month.

Presentations

What's new in Kubernetes 1.8 (sponsored by Google Cloud) Session

Kubernetes has become the go-to open source framework for managing containers and building application platforms that scale from 1 to 5,000 machines. Kelsey Hightower offers an overview of the Kubernetes 1.8 release and explains why this trend will continue.

Lara Hogan is the vice president of engineering at Kickstarter and the author of Designing for Performance and Demystifying Public Speaking. Lara champions performance as a part of the overall user experience, striking a balance between aesthetics and speed and building performance into company culture.

Presentations

Meet the Experts with Lara Hogan Meet The Experts

Join Lara to discuss public speaking, mentorship and sponsorship, engineering management, and last but not least, donuts.

Mentorship and sponsorship Keynote

To grow your technical leadership skills, it's critical to lean on your network of support. Mentors—people who can give you helpful advice—are usually easy enough to find. Lara Hogan explains that what can be even more valuable is finding sponsors—people who can help you find new opportunities and improve the visibility of your work.

Jonah Horowitz is a site reliability engineer at Stripe, where he works with all of the company’s individual engineering teams to drive reliability efforts, including monitoring, alerting, deployment pipelines, and chaos resiliency. Previously, Jonah worked at several startups around the Bay Area, including Netflix, Quantcast (a leading ad-tech startup, where he grew the company’s network to process over three million events per second), and Looksmart (a contextual advertising company), and was on the founding team of Walmart.com (now @Walmart Labs), where he built out the company’s software deployment pipelines and its product image management systems.

Presentations

Persistent SRE anti-patterns: Pitfalls on the road to creating a successful SRE program like Netflix and Google Session

People aren't just wrong on the internet. Sometimes they bring it back to the office. Blake Bisset and Jonah Horowitz share stories about anti-patterns in monitoring, incident response, configuration management, and more and explain how Google and Netflix view the role of the SRE (and how it differs from the traditional system administrator role).

Won Jun Jang is an observability engineer at Uber Technologies, working on distributed tracing, monitoring, and performance. In his spare time, he gets lectured by his life coach to write a more interesting Bio to sell himself better.

Presentations

From zero to distributed traces: An OpenTracing tutorial Tutorial

Yuri Shkuro, Bryan Liles, Won Jun Jang, and Prithvi Raj walk you through implementing distributed tracing in modern applications, using the CNCF’s OpenTracing project. You'll explore a set of sample applications and learn how to instrument them for tracing. You'll also use a tracing system such as Jaeger, Zipkin, or LightStep to visualize complex transactions that might span multiple processes.

Oded Keret is a senior product manager at Micro Focus leading market research, roadmap definition, strategy, and innovation for the Performance Engineering product line.

Presentations

Scaling up for performance engineers: Practicing what we preach (sponsored by Micro Focus) Session

Oded Keret shares HPE's performance testing experience, the challenges the company overcame, and the lessons learned along the way.

Karthik Kirupanithi is a software development engineer at Amazon Web Services. Previously, Karthik worked for Amazon, Microsoft, Thomson Reuters, and Dow Jones. He enjoys solving everyday problems and automating and simplifying everyday work; his recent interests include systems management and Alexa skill development.

Presentations

Systems management with a voice UI using Amazon Alexa Session

Voice UIs like Amazon's Alexa can make systems management simple, intuitive, and delightful. The virtual private assistant feel of a VUI, coupled with the abstraction that voice commands bring, break the tedium of management tasks. Karthik Kirupanithi demonstrates how to put together an Alexa skill that can perform tasks using the EC2 Systems Manager.

Ignat Korchagin is a security engineer at Cloudflare, where he works mostly on platform and hardware security. Ignat’s interests are cryptography, hacking, and low-level programming. Previously, he was senior security engineer for Samsung Electronics’s Mobile Communications Division, and his solutions can be found in many older Samsung smartphones and tablets. Ignat started his career as a security researcher in the Ukrainian government’s communications services.

Presentations

Managing server secrets at scale with a vaultless password manager Session

Ever wondered how to quickly and efficiently rollover all of your servers’ SSH keys or how to securely manage diskless systems? Ignat Korchagin outlines a simple approach that combines hardware support and a little cryptography to help operationalize the management of all the secrets in your cloud.

John Le Drew is founder and principal at consulting firm Wise Noodles, where he focuses on facilitating a safe, creative, collaborative environment. John’s move into consultancy has taught him the value of team dynamics and that most technical challenges are projections of underlying issues with collaboration. Previously, John spent most of the last two decades working in the software industry, focusing on web technologies, which included 10 years as a software engineer.

Presentations

Pay attention: Why you should care about psychological safety Session

John Le Drew draws on the hours of interviews he conducted with some of the most respected people in the industry for the Agile Path podcast to explain what psychological safety is and why you should care about it, as he walks you through a series of highly interactive role-playing and improvisation exercises.

Claire Le Goues is an assistant professor in the School of Computer Science at Carnegie Mellon University, where she researches ways to automatically and confidently evolve, debug, maintain, and improve real-world software systems. Claire also directs CMU’s undergraduate research and education programs in software engineering and is passionate about training software engineering practitioners and researchers from all backgrounds and walks of life.

Presentations

FTFY: Research advances in automatic bug repair Keynote

Claire Le Goues shares recent advances in academic software engineering and programming languages research that aims to bring that dream to reality, using everything from metaheuristic search to program synthesis to machine learning and search over big databases of existing code to make it happen.

Meet the Experts with Claire Le Goues Meet The Experts

Come talk with Claire about software quality—especially automated debugging. There is no question too big or small. She is happy to discuss any kinds of development or challenges people face and the ways that academic research can (or can't) help them.

Richard Li is cofounder and CEO of Datawire. A veteran of several successful high-technology startups. previously, Richard was vice president of product and strategy at Duo Security and vice president of strategy and corporate development at Rapid7. He also led the creation of the original product management organization at Rapid7. Richard has also held a number of leadership positions in sales, marketing, and engineering at Red Hat. He is a recognized microservices expert and has spoken at the Microservices Practitioner Summit, ApacheCon, and Boston DevOps Days. He holds both a BS and MEng in computer science from MIT.

Presentations

Developing resilient microservices with Kubernetes and Envoy Tutorial

Microservices are an increasingly popular approach to building cloud-native applications, and dozens of new technologies that streamline adopting microservices development, such as Docker, Kubernetes, and Envoy, have been released over the past few years. Phil Lombardi, Rafael Schloming, and Richard Li walk you through actually using these technologies to develop, deploy, and run microservices.

Bryan Liles is a principal engineer on the cloud engineering team at Capital One. When not helping a huge back move to the public cloud, he gets to speak at conferences on topics ranging from machine learning to building the next generation of developers. In his free time, Bryan races cars in straight lines and around turns and builds robots and devices.

Presentations

From zero to distributed traces: An OpenTracing tutorial Tutorial

Yuri Shkuro, Bryan Liles, Won Jun Jang, and Prithvi Raj walk you through implementing distributed tracing in modern applications, using the CNCF’s OpenTracing project. You'll explore a set of sample applications and learn how to instrument them for tracing. You'll also use a tracing system such as Jaeger, Zipkin, or LightStep to visualize complex transactions that might span multiple processes.

Meet the Experts with Bryan Liles Meet The Experts

Stop by to hear Bryan explore paths for starting a career in systems engineering, ideas on where we could go in the future, and how words, technology, and empathy impact the people and projects we interact with every day.

Sysadmins and DevOps and SREs, oh my! Session

Our industry is continuing to mature, and there is a path for you. Bryan Liles explores paths for starting a career in systems engineering, ideas on where we could go in the future, and how words, technology, and empathy impact the people and projects we interact with every day.

Ben Linsay is an engineer at Bumpers. Previously, Ben was an engineer at Kickstarter, Aggregate Knowledge, and Boundary.

Presentations

Putting your first paper into production Session

Machine learning is as accessible as it has ever been, but it’s not always obvious how to go from a cool paper to serving production traffic. Ben Linsay helps you get started putting your paper into production, sharing lessons learned solving real problems with machine learning at Kickstarter.

Phil Lombardi is a senior platform engineer at Datawire, where he is building the cloud services for Datawire.io’s resilient microservices framework. Phil has extensive experience building and operating distributed systems and continuous delivery pipelines in both the internet of things and web services spaces.

Presentations

Developing resilient microservices with Kubernetes and Envoy Tutorial

Microservices are an increasingly popular approach to building cloud-native applications, and dozens of new technologies that streamline adopting microservices development, such as Docker, Kubernetes, and Envoy, have been released over the past few years. Phil Lombardi, Rafael Schloming, and Richard Li walk you through actually using these technologies to develop, deploy, and run microservices.

Kelly Looney is the director of DevOps consulting at Skytap and a frequent public speaker at events like DevOps Days, as well as Skytap-hosted panels and roundtable discussions. Previously, Kelly led a large-scale Agile and DevOps transformation at bwin.party in Vienna, Austria, that involved over 900 development and operations personnel. Kelly has varied experience with all types of software development and operations, across many business areas at organizations such as Capgemini, Booz Allen Hamilton and, more recently, the Agile- and DevOps-focused firms ThoughtWorks and Valtech, where he consulted with development organizations all over the world. Kelly has personally worked with such industry luminaries as Dick Gabriel, Kent Beck, Jez Humble, and Luke Hohmann. He has a great grasp of both the theory and practical problems that go along with the transition to Agile and DevOps.

Presentations

How do you eat a whale? One byte at a time Session

Kelly Looney shares an incremental approach to introducing containers into complex, distributed applications—resulting in modernization with less risk and more reward. You’ll learn how to evaluate which components of your applications are best suited for containers, how to experiment safely and get fast feedback, and how to increase and scale your container adoption.

Mark McBride is founder and CEO of Turbine Labs, building products that help engineers ship features more quickly and safely. Previously, Mark was services engineer lead at Nest Labs and Google, where he was responsible for the development of Nest’s server infrastructure that makes it possible for Nest customers to connect with their homes from wherever they are, and as an early developer on Twitter’s streaming API, delivering thousands of messages per second in real time to millions of users. During his time at Twitter, Mark managed developer productivity and led the web delivery, developer tools, and infrastructure test teams; he also worked with a variety of deploy pipelines and led development of some of Twitter’s early service migrations, which grew into a suite of tools used to migrate of millions of requests per second from legacy services to modern replacements.

Presentations

Customer-centric observability Session

With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective.

Duncan McAllister is a senior enterprise architect on the global consulting services team at Akamai, where he focuses on web performance. Previously, he was an enterprise architect supporting the architecture and platform domain for one of the top US retail enterprises, where he was responsible for the deployment and integration architecture and championed several foundational initiatives, including a hybrid-cloud-hosting migration, a multiregion deployment architecture, and a continuous application delivery pipeline.

Presentations

A reference architecture to automate content delivery into your CI/CD workflows (sponsored by Akamai) Session

CDN automation and pipeline integration can often be a daunting task. Too often these services are integrated late in the delivery process, traditionally in the QA or production deployment phases. Duncan McCallister and Akshay Ranganath share approaches that account for CDNs much earlier in the development lifecycle and highlight specific considerations around CI/CD pipeline integration.

Mike McGarr is the engineering manager for the developer productivity team at Netflix and a cohost of the Ship Show podcast. Mike has been developing Java/JVM-based applications for most of his career and been known to dabble in other languages as well. Previously, Mike was the director of DevOps at Blackboard and the founder of the DC Continuous Delivery meetup. He is passionate about building quality software through automation. Mike can frequently be found talking about Agile, continuous delivery, DevOps, or build and test automation.

Presentations

The rise of polyglot programming at Netflix Session

Netflix has always been a Java shop, from its early DVD days to its migration to the cloud. This simplified the job for centralized teams, but as the popularity of non-JVM languages rose, these teams have begun to rethink their support strategy. Mike McGarr discusses the early days of Netflix's polyglot journey and where the company is going in the future.

Tyler McMullen is CTO of Fastly, where he is responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s instant purging system, API, and real-time analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, Tyler has experience in everything from web design to kernel development and loathes all of it. Especially distributed systems.

Presentations

Building a skyscraper with Legos: The anatomy of a distributed system Session

Many words have been spilled about distributed systems. Most of the time though, what we talk about are algorithms and techniques. But the practical realities of distributed systems are far from straightforward. Tyler McMullen outlines a new approach built to perform very high volumes of health checks across a cluster of machines for reliability and scalability.

Carin Meier is a software developer at Cognitect. Carin started as a professional ballet dancer, but she studied physics in college and has been developing software for both the enterprise and entrepreneur ever since. Carin has a strong background in Ruby and Clojure and is interested in combining hardware and software at the intersection of the physical and digital world. She has helped clients develop home automation systems and written a control library for the Parrot AR Drone in Clojure. She is highly involved in the developer community and has spoken at many conferences, including keynoting at OSCON and Strange Loop. Carin also helps lead the Cincinnati Functional Programmers and is the author of Living Clojure.

Presentations

Unconventional programming paradigms for the future, now Keynote

As technology advances, our systems are growing more and more complex, reaching the threshold of what we can handle and even comprehend. We need more than tools to keep it under control. We need new ways of thinking. Carin Meier explores new ways to approach systems and tame complexity for the rapidly changing future.

Terran Melconian is a data science consultant and trainer at Air Network Simulation and Analysis. Terran has worked in the consumer web space for the last decade in roles that included software development, operations, data warehousing, and data science. Previously, he built up and managed teams at TripAdvisor and Jobcase, where he focused on hiring generalists and teaching them about the specifics. Terran is passionate about continuing education for high-performing professionals.

Presentations

Debugging complex systems Session

Terran Melconian explores an organized process for observing a misbehaving complex system, reasoning about possible causes, and isolating the fault. While it is not generally taught, all the successful senior engineers with operational experience Terran has talked to use a variant of this process.

Jon Moore is the chief software architect at Comcast Cable, where he focuses on delivering a core set of scalable, performant, robust software components for the company’s varied software product development groups. Jon specializes in the “art of the possible,” finding ways to coordinate working solutions for complex problems and deliver them on time. He is equally comfortable leading and managing teams and personally writing production-ready code and has a passion for software engineering, continuously learning, and teaching colleagues new ways to deliver working, maintainable software with ever-higher quality and ever-shorter delivery times. His interests include distributed systems, fault tolerance, building healthy and engaging engineering cultures, and Texas Hold’em. Jon holds a PhD in computer and information science from the University of Pennsylvania. He resides in West Philadelphia, although he was neither born nor raised there and does not spend most of his days on playgrounds.

Presentations

The art of the possible Session

How does a large 50-year-old company go from purchasing much of its technology and working with yearlong release cycles to building multiple products in-house and releasing daily? Jon Moore traces the changing set of tools, techniques, and attitudes that have powered (and still power) this transformation at Comcast over the last decade, mapping out a path you can follow in your own company.

Neha Narula is director of research at the Digital Currency Initiative (part of the MIT Media Lab), where she teaches courses and leads cryptocurrency and blockchain research. Previously, she built fast, scalable databases and secure software systems. She speaks about these topics at industry and research conferences. In a previous life, Neha helped relaunch the news aggregator Digg and was a senior software engineer at Google, where she designed Blobstore, a system for storing and serving petabytes of immutable data, and worked on Native Client, a system for running native code securely through a browser. She holds a PhD in computer science from MIT.

Presentations

Blockchains and cryptocurrencies: New paradigms for shared data Keynote

Bitcoin showed us a new way of moving value around the internet without intermediaries. Neha Narula explains how this paradigm might apply to our traditional ways of thinking about databases that cross organizational boundaries. As data on the web becomes consolidated around a few key players, the blockchain might help users gain more control.

Lex Neva is a site reliability engineer at Heroku. Lex originally trained in computer science, but he found that he most enjoyed applying his software engineering skills to operations. Previously, he kept large services running at Linden Lab’s Second Life and DeviantArt.com. A veteran of many large incidents, Lex has strong opinions on incident response, retrospectives, on-call sustainability, and good development and release processes.

Presentations

The phone book is on fire: Lessons from the Dyn DNS DDoS attack Session

When the DDoS attack crushed Dyn last October, did your DNS fail? Heroku's sure did. In response, Lex Neva deep dove into everything DNS to learn how to implement resilient DNS properly—reading RFCs, asking questions of pros, and performing real-world experiments when no one knew the answers. Join Lex to find out what does work and all the crazy details of DNS that he uncovered.

Deb Nicholson is the community outreach director for the Open Invention Network—the defensive patent pool built to protect Linux projects. She is also the community manager for GNU MediaGoblin, a brand-new federated media hosting program. Deb works at the intersection of technology and social justice. She has over 15 years of nonprofit management experience and got involved in the free software movement about five years ago when she started working for the Free Software Foundation. In her spare time, Deb serves on the board of OpenHatch, a small nonprofit dedicated to identifying and mentoring new free software contributors, with a particular interest in building a more diverse free software movement.

Presentations

Find your way: Orienteering for managers Session

Are you managing distributed teams with very different stakeholders—perhaps even a mix of hobbyists and paid staff? It probably seemed easy at first, but the further you travel, the more unfamiliar the terrain appears. Luckily, this is not new ground. Many have gotten lost here before and found their way out again. VM Brasseur and Deb Nicholson share a map to productive, happy teams.

Meet the Experts with VM Brasseur and Deb Nicholson Meet The Experts

Deb and Vicky will answer all your questions about team composition, conflict resolution, communication strategies, goal and expectation setting, or anything else dealing with creating a healthy cross-functional team.

Michelle Noorali is a senior software engineer on the Azure Container Service team at Microsoft and a core maintainer on the Kubernetes Helm project, a package manager for Kubernetes. Michelle also co-leads SIG-Apps, the Kubernetes special interest group for running and managing applications on Kubernetes. She is primarily a Go developer but has Ruby roots. Michelle is passionate about developer experiences.

Presentations

Managing applications on Kubernetes with Helm Session

Container orchestration platform Kubernetes has seen unprecedented traction and adoption in the last few years. However, it can be difficult to figure out how to actually deploy your applications on Kubernetes if you're new to the space. Michelle Noorali walks you through configuring, deploying, and managing applications on Kubernetes using an open source tool called Helm.

Alex Petrov is a lead data infrastructure engineer at DataStax. A longtime Cassandra user, Alex was a 2015 Cassandra MVP. He’s also a Project Reactor committer. Recently, Alex has been working on data processing pipelines and near-real-time processing backends.

Presentations

What we talk about when we talk about on-disk storage Session

In the world of big and fast data, it's important to be fluent in storage and know the right tools for each job. Alex Petrov shares techniques for picking the right database and indexes, understanding the trade-offs different types of storage bring, scaling out your data and planning its growth, and finding the best resources on the subject.

Lisa Phillips is vice president of site reliability engineering at Fastly. With a focus on internet and web technologies with an emphasis on systems and database administration, architecture, engineering, and management, Lisa isn’t afraid of hard problems or scale. She brings extensive experience in implementation and management of internet services to ensure highest levels of system availability and performance globally.

Presentations

Managing grumpy to build stronger teams Session

Lisa Phillips shares strategies for overcoming individual and organizational management challenges in a globally diverse environment and explores people management challenges and methods to work with the grumpiest admin.

Guy Podjarny is a cofounder and CEO at Snyk.io, where he focuses on securing open source code. He was previously CTO at Akamai and founder of Blaze.io. He also worked on the first web app firewall and security code analyzer. Guy is a frequent conference speaker, the author of Responsive & Fast, High Performance Images, and the upcoming Securing Third Party Code, and the creator of Mobitest. He also writes on Guypo.com and Medium.

Presentations

Serverless security: What's left to protect? Session

Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. Guy Podjarny explores the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe.

Ilan Rabinovitch is director of technical community and evangelism at Datadog. Previously, Ilan spent a number of years leading infrastructure and reliability engineering teams at organizations such as Ooyala and Edmunds.com. He’s active in the open source and DevOps communities, where he is a co-organizer of events such as SCALE, Texas Linux Fest, DevOpsDay LA, and DevOpsDays Silicon Valley.

Presentations

Monitoring containers: Follow the data Session

Drawing on real-world metrics data from thousands of organizations, Ilan Rabinovitch shares the latest trends in container adoption and use, explores the types of applications organizations are running in containers, and explains how to best monitor these containerized applications.

Prithvi Raj is an observability engineer working on Uber’s distributed tracing system, Jaeger.

Presentations

From zero to distributed traces: An OpenTracing tutorial Tutorial

Yuri Shkuro, Bryan Liles, Won Jun Jang, and Prithvi Raj walk you through implementing distributed tracing in modern applications, using the CNCF’s OpenTracing project. You'll explore a set of sample applications and learn how to instrument them for tracing. You'll also use a tracing system such as Jaeger, Zipkin, or LightStep to visualize complex transactions that might span multiple processes.

Akshay Ranganath is an enterprise architect and web performance consultant with over 15 years’ experience in the world of IT and performance consulting. He works to make websites blazingly fast while remaining secure, and his DevOps consultancy focuses on integrating CDN into CI/CD pipelines and working to enable some of Akamai’s largest customers to build complex routing solutions and implement innovating business rules at the CDN layer.

Presentations

A reference architecture to automate content delivery into your CI/CD workflows (sponsored by Akamai) Session

CDN automation and pipeline integration can often be a daunting task. Too often these services are integrated late in the delivery process, traditionally in the QA or production deployment phases. Duncan McCallister and Akshay Ranganath share approaches that account for CDNs much earlier in the development lifecycle and highlight specific considerations around CI/CD pipeline integration.

Anant Rao is an engineering lead at LinkedIn, where he works on performance optimization and capacity planning, focusing on making LinkedIn’s apps go fast and working on infrastructure to prevent performance issues before they make it to production.

Presentations

How LinkedIn determines the capacity limits of its services using live traffic Session

Susie Xia and Anant Rao explain how LinkedIn leverages live production traffic to determine service and resource bottlenecks at scale with a tool called Redliner and how you can use your current architecture to do the same.

Tanya Reilly is a system administrator and site reliability engineer at Google, where she works on low-level infrastructure like distributed locking, load balancing, and bootstrapping. Previously, she was a system administrator at Eircom.net, Ireland’s largest ISP, and the entire IT Department for a small software house.

Presentations

Have you tried turning it off and turning it on again? Session

Tanya Reilly explores the parts of disaster recovery you might be less prepared for, covering why the best laid fallback plans tend to go wrong and why you should start deliberately managing your dependencies long before you think you need to.

Robert Reta is a senior software engineer on Netflix’s playback licensing team.

Presentations

Event sourcing on a global scale: Netflix downloads Session

The Netflix download feature allows users to download content for offline playback. Implementing this feature required a new persistence architecture to maintain the state of user devices and content licenses. Joseph Breuer and Robert Reta explore the technical decisions behind the choice of a Cassandra event sourcing data store.

Liz Rice is the technology evangelist at container security specialists Aqua Security. Previously, she cofounded container startup Microscaling Systems, which built a real-time scaling engine and the popular image inspector MicroBadger. Liz has a wealth of software development, team, and product management experience from her years spent working on network protocols and distributed systems and in digital technology sectors such as VOD, music, and VoIP. When not building startups and writing code, Liz loves riding bikes in places with better weather than her native London.

Presentations

Your (container) secret's safe with me Session

In a containerized deployment, how do you safely pass secrets like passwords and certificates between containers without compromising their safety? If orchestration means a container can run on any machine in the cluster, how do you minimize who knows your secrets? Liz Rice explores the risks and shares best practices for keeping your secrets safe.

Nick Rockwell is the chief technology officer at the New York Times. Throughout his career, Nick has worked at the intersection of media and the internet, building digital products for mass audiences. Previously, he was chief technology officer at Condé Nast and digital CTO at MTV Networks.

Presentations

What if serverless was real? Keynote

For most of us, the best approach to scaling complex distributed systems is to not do it at all. Nick Rockwell asks, so why isn't serverless a bigger deal?

Andrew Rodland is a senior backend engineer for video systems at Vimeo. For over a decade, Andrew has been mixing the chocolate of dev in the peanut butter of ops. Previously, he was a developer at Shutterstock. Andrew has spoken at (and organized streaming video for) prominent Perl conferences and contributed some cool stuff to CPAN.

Presentations

Load balancing, consistent hashing, and locality Session

Serving a billion requests per day with a dynamic video packager makes unique demands on a load balancer. Andrew Rodland shares a new consistent hashing algorithm developed by Google researchers that helped improve cache locality and optimize delivery—and made a contribution to open source software in the process.

Alex Rukletsov is an Apache committer and Mesos PMC member at Mesosphere. He loves making programs run faster, reducing the cognitive load of code, and creating the right abstractions. In a previous life, Alex segmented medical images and investigated the behavior of human vessels at several German research institutes. His areas of interests include distributed systems, object recognition, and probabilistic and heuristic algorithms.

Presentations

Health checking: A not-so-trivial task in the distributed containerized world Session

Application health checking and probing have existed since the dawn of computer science. Usually seen as a trivial task, health checking becomes more involved when applied to distributed cloud-native apps. Alexander Rukletsov discusses the challenges and perils of modern health checking and shares lessons learned during the revamp of the Apache Mesos health checks subsystem.

Cynthia Savard Saucier is director of design at Shopify, where she has sprinkled her creativity in many important projects, notably the company’s web and mobile interfaces. Passionate about human beings and their means of communication, Cynthia has always sought a deeper understanding of how people think, interact, and connect. In a field that is not always fully understood, she excels by creating smart, emotional connections between companies and users. Cynthia has a knack for strategic design, ergonomics, and problem solving. Her wide range of experience has brought her broad recognition as a leading expert on multiplatform interface design. She was awarded the 2010 RAÉDIUM prize for Chouette!, a technological communication platform to strengthen intergenerational relations. In addition to her day job, Cynthia mentors startups and is regularly invited to speak at events around the world, where her playful approach both startles and charms. In her conference presentations, she shares her passion for her point of view: user-centered design is a reality, not a utopian methodology. Cynthia holds a degree in industrial design from Université de Montréal.

Presentations

The impact of design: How design influences outcomes Keynote

We like to think that technology can make the world a better place, but we (conveniently) forget how it can make it worse. Primum non nocere (first do no harm) is the first concept taught in medical school, serving as a reminder of the possible harm that any intervention might do. Cynthia Savard Saucier challenges the tech industry to come up with its own fundamental principle.

Rafael Schloming is the CTO of Datawire. Rafael is a coauthor of the Advanced Message Queuing Protocol (AMQP) specification and the primary architect of the open source Apache Qpid Proton project. Previously, he was a principal software engineer at Red Hat, where he worked on messaging technologies.

Presentations

Developing resilient microservices with Kubernetes and Envoy Tutorial

Microservices are an increasingly popular approach to building cloud-native applications, and dozens of new technologies that streamline adopting microservices development, such as Docker, Kubernetes, and Envoy, have been released over the past few years. Phil Lombardi, Rafael Schloming, and Richard Li walk you through actually using these technologies to develop, deploy, and run microservices.

Baron Schwartz is founder and CEO of VividCortex, the best way to see what your production database servers are doing. He is the lead author of High Performance MySQL and a variety of open source software.

Presentations

Instrumenting systems for arbitrary observability Session

Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? Baron Schwartz outlines the most useful things to know about observability in systems in production.

Yuri Shkuro is a staff engineer at Uber Technologies, working on distributed tracing, reliability, and performance. Yuri is the coauthor of the OpenTracing standard (a CNCF project) and a tech lead for Jaeger, Uber’s open source distributed tracing system.

Presentations

From zero to distributed traces: An OpenTracing tutorial Tutorial

Yuri Shkuro, Bryan Liles, Won Jun Jang, and Prithvi Raj walk you through implementing distributed tracing in modern applications, using the CNCF’s OpenTracing project. You'll explore a set of sample applications and learn how to instrument them for tracing. You'll also use a tracing system such as Jaeger, Zipkin, or LightStep to visualize complex transactions that might span multiple processes.

Julien Simon is a technical evangelist at AWS. Previously, Julien spent 10 years as a CTO and vice president of engineering at a number of top-tier web startups. He’s particularly interested in all things architecture, deployment, performance, scalability, and data. Julien frequently speaks at conferences and technical workshops, where he helps developers and enterprises bring their ideas to life thanks to the Amazon Web Services infrastructure.

Presentations

FPGAs in the cloud? Session

FPGAs have become a hot topic in the IT industry, thanks to the unprecedented computing power that they bring to demanding HPC applications, and AWS recently introduced FPGA-powered instances (aka F1 instances) to make the process simpler and quicker. Julien Simon walks you through building an FPGA-enabled application, from design to simulation to synthesis to execution on an F1 instance.

Michal Skiba is principal product line manager at Intel, where he leads product planning and marketing for Intel’s FPGA virtualization framework. Previously, he was a product line manager for Cisco’s data center switches. Michal holds an MASc from the University of Waterloo and a BASc from the University of Toronto. He has led several mountain-climbing expeditions in the Himalayas.

Presentations

Accelerating cloud applications with Intel FPGAs (sponsored by Intel) Session

Field-programmable gate arrays (FPGAs)—customizable digital circuits capable of processing large amounts of data incredibly quickly—have traditionally required deep expertise to program. Michal Skiba explains how Intel is helping developers accelerate their cloud applications through a software stack that greatly simplifies the use and management of FPGAs.

Ines Sombra is director of engineering at Fastly, where she spends her time helping the web go faster. Ines holds an MS in computology with an emphasis on cheesy ’80’s rock ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life, she was a data engineer.

Presentations

Tuesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Cindy Sridharan is an engineer at imgix, where she works on API development, infrastructure, and other miscellaneous backend engineering tasks. Cindy likes thinking about building resilient and maintainable systems and recently started writing about several of these topics.

Presentations

Monitoring in the time of cloud native Session

As the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools prove to be grossly insufficient. Fortunately, the state of monitoring has evolved to meet these new demands, but it brings its own set of technical and organizational challenges. Cindy Sridharan offers an honest overview of monitoring challenges and trade-offs.

Mike Strickland leads the FPGA high-performance computing vision within Intel’s Programmable Solutions Group. Mike has more than 20 years of computer, networking, and storage experience with companies such as Hewlett-Packard, Silverback Systems, and Altera (acquired by Intel). Previously, Mike led the development and launch of products in the networking, storage management, TCP/IP offload, and iSCSI spaces. He holds a BS in electrical engineering from Brown University and an MS in management from the Sloan School of Management at MIT.

Presentations

FPGA-accelerated data analytics (sponsored by Intel) Session

Microsoft has widely deployed field-programmable gate arrays (FPGAs) for accelerating search, networking, and machine learning—with a little help from the company’s software expertise and its FPGA programmers. Mike Strickland explains how a single FPGA can deliver significant acceleration for multiple workloads.

Swaminathan Sundaramurthy is an engineering manager at Pinterest, where he manages the company’s stream platform and ML training platform efforts. Previously, Swaminathan worked as an IC for more than 12 years, building distributed systems and cloud platforms at various large companies.

Presentations

Genji: A framework for building resilient near-real-time data pipelines Session

Pinterest has to support real-time decision making while operating on petabyte-scale data. Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest's real-time data pipeline (modeled on quasi-Kappa architecture), its impact on the company's systems, and tools and processes used and demonstrate how Pinterest models real-time ads analytics on the platform.

M​ary Treseler is vice president of content strategy at O’Reilly Media, ​where she leads an editorial team that covers a wide range of topics from DevOps to design, and the chair of O’Reilly’s Velocity Conference. Mary has been working on technical content for 25 years, acquiring and developing content in areas such as programming, software engineering, and product design. A Boston native, Mary lives​ oceanside​ ​in Padanaram, MA.

Presentations

Tuesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Andrew Turley is a software engineer at Wallaroo Labs, where he works on Wallaroo, a system for building high-performance event-driven applications.

Presentations

Developing scale-agnostic distributed systems with entities Session

The cost of coordinating access to information in a distributed system increases as the system scales up. Andrew Turley offers an overview of the entity-based approach to addressing this issue and explains how it has influenced the design of Wallaroo, a platform for building high-performance, event-driven systems.

James Turnbull is the CTO of Empatico. A longtime member of the open source community, James is the author of nine technical books about open source software: The Terraform Book, The Art of Monitoring, The Logstash Book, The Docker Book, Pro Puppet, Pulling Strings with Puppet, Pro Linux System Administration, Pro Nagios 2.0, and Hardening Linux. He was formerly CTO at Kickstarter and an advisor at Docker. James likes food, wine, books, photography, and cats. He is not overly keen on long walks on the beach or holding hands.

Presentations

Tuesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the first day of keynotes.

Wednesday opening welcome Keynote

Mary Treseler, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.

Jeff Valeo is a lead site reliability engineer on the cloud infrastructure team at Grubhub. Previously, Jeff was a technical lead at Apple and an engineer at Google.

Presentations

Lessons learned from load-testing distributed systems Session

Load testing is a complicated and time-consuming process in the world of monolithic applications. And with the move to distributed systems (microservices), it is even more complicated. Jeffrey Valeo draws on real-world examples to share tips on how to effectively load-test distributed systems.

Seth Vargo is the director of technical advocacy at HashiCorp. Previously, he worked at Chef (Opscode), CustomInk, and a few Pittsburgh-based startups. He is the author of Learning Chef. Seth is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth enjoys spending time with his friends and advising nonprofits. He loves all things bacon.

Presentations

Microservices secrets management with Vault Tutorial

It’s great that you’ve moved to microservices, but how are you distributing secrets? Seth Vargo offers an overview of Vault’s unique approach to secret management by providing secrets as a service for your services (and your humans too), which is highly scalable and easily customizable to fit any environment.

Leif Walsh is an engineering manager at Two Sigma, where he works on the company’s next-generation data analysis platform for distributed time series research and simulation. Leif’s background is in high-performance storage. Previously, he built fractal trees at Tokutek. He loves the Oxford comma, cooking, and playing with cats.

Presentations

Scalable, fluent time series data analysis Session

Leif Walsh offers an overview of Flint, Two Sigma's open source time series extension to Spark, explains how it fits in with the Spark programming model, and lays out the roadmap for the future of pandas, PySpark, and Flint.

Sarah Wells is a principal engineer at the Financial Times currently working on building a semantic publishing platform making it easy to discover and access all the FT’s published content via APIs in a common and flexible format. Sarah has been a developer for 15 years, working across consultancy, financial services, and media.

Presentations

Operating microservices: Everything is at scale. Session

Most people think about microservices as a solution for scale. That may be the case, but operating them is definitely a scale challenge. Sarah Wells explains why, when you have 100+ services, everything needs to be automated, or else you'll spend two days updating Jenkins build pipelines or be woken up every night by false alarms caused by network blips.

Jesse White is a Principal at Contino with more than 15 years of technology industry experience across financial services, healthcare, advertising, e-commerce, and IoT. A 10 year veteran of New York City’s startup ecosystem, he’s been busy building tech-focused human capital, with an eye for getting things done. As a founder of the DockerNYC Meetup group, Jesse has spent the last 5 years build teams and technology at the intersection of DevOps, cloud computing, and security. Jesse is an avid eater/listener/reader of good things.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Shawn Bower, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

David Woods is a professor at the Ohio State University, where he is the lead for the Initiative on Complexity in Natural, Social, and Engineered Systems and the codirector of Ohio State University’s Cognitive Systems Engineering Laboratory. David is a former president of both the Resilience Engineering Association and the Human Factors and Ergonomics Society.

Presentations

Above the line, below the line: A preview of the SNAFUcatchers Stella Report Keynote

David Woods and Richard Cook offer a glimpse at the SNAFUcatchers Stella Report.

Susie Xia is a senior software engineer at LinkedIn, where she focuses on scalability and capacity analysis. Previously, she worked on mobile applications and automation.

Presentations

How LinkedIn determines the capacity limits of its services using live traffic Session

Susie Xia and Anant Rao explain how LinkedIn leverages live production traffic to determine service and resource bottlenecks at scale with a tool called Redliner and how you can use your current architecture to do the same.

Zhenzhong Xu is a software engineer working on highly scalable and resilient streaming data infrastructure at Netflix. Previously, he was a core contributor to Microsoft Azure data center operating system reconciliation management and resiliency functionalities. He is passionate about anything related to real-time data systems and large-scale distributed systems.

Presentations

Running a massively parallel stream processing system at Netflix Session

Keystone, a critical piece of Netflix's backend data infrastructure, ensures massive data movements and real-time event processing. Zhenzhong Xu leads a deep dive into Keystone's architecture and underlying stream processing engines, sharing insights and proven paths on how the company achieves multitenancy, scalability, and resilience in a complex cloud-native distributed system environment.