Building and maintaining complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Speakers

Hear from a wide range of talented people who are doing amazing things in web operations, performance, DevOps, and systems engineering. New speakers are added regularly. Please check back to see the latest updates to the agenda.

Filter

Search Speakers

Tom is a hands-on technical principal at ThoughtWorks, a software company and a community of passionate individuals who are seeking to revolutionize the IT industry. He believes the hardest problem for a developer is fighting the urge to solve a different, more interesting problem than the one at hand. Tom is interested in helping teams deliver awesome software products, sharing knowledge and all the tools (especially Java).

Presentations

Creating pipelines that build, test and deploy containerized artifacts. Tutorial

Containerization has launched a new wave of software deployment models But, do to our philosophies for building, testing and deploying software still hold true? This workshop provides a hands-on look at transforming a continuous integration pipeline from creating software artifacts to building, testing and deploying container images.

Neeraj is an engineering manager on Infrastructure team at Quora leading the company wide efforts on making Quora fast and responsive. He started as an engineer working on server side performance at Quora in 2011 and has since revamped how the company thinks about performance. In his role as the speed lead, he has built the speed-team from ground up, and has built a culture of “performance matters” at Quora. Before leading performance efforts, he also worked on multiple other parts of the product in his 6 years tenure at Quora – from writing the first version of Quora for iPhone, to building a distributed realtime ranking service for their personalized news feed product.

Presentations

Scaling ML Systems Powering Quora's Home Feed Session

Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. In this talk, I will describe the evolution of its architecture and share several lessons from building and scaling this system.

Vinu Charanya is a Senior Software Engineer at Twitter where she works in the Compute Platform building Twitter’s internal cloud infrastructure management platform. She is also a core team member of Women who code, a non-profit organization dedicated to inspiring women to excel in technology careers. She is also a part-time teacher and a mentor helping students learn iOS and Android development at CodePath. Vinu received her Masters in Computer Science and Engineering from University at Buffalo, where she worked on the PhoneLab Testbed research group under Prof. Geoff Challen and Prof. Steve Ko.

Presentations

How we built a framework at Twitter to improve infrastructure utilization & efficiency at scale Session

Twitter is powered by thousands of microservices that run on our internal Cloud platform which consists of a suite of multi-tenant platform services that offer Compute, Storage, Messaging, Monitoring, etc as a service. In this talk, I share my team's work that help capture & define, provision, meter & charge infrastructure resources redefining how systems are built atop Twitter Infrastructure.

Kris Beevers, founder and CEO of NS1, is an internet infrastructure geek and serial entrepreneur who’s started two companies, built the tech for two others, and has a particular specialty in architecting high volume, globally distributed internet infrastructure. Before NS1, Kris built CDN, cloud, bare metal, and other infrastructure products at Voxel, a NY based hosting company that sold to Internap in 2011. He holds BS, MS, and PhD degrees in computer science from RPI.

Presentations

From prototype to mission critical: the evolution of edge architecture at a global DNS service provider Session

We'll discuss the evolution of the edge delivery architecture of a major DNS service provider – from our earliest prototypes to the large, heavily automated global network we operate today – and the many operational lessons we've learned along the way.

Blake Bisset got his first legal tech job at 16. He won’t say how long ago, except that he’s legitimately entitled to make shakeyfists while shouting “Get off my LAN!” He’s done 3 start-ups (a joint venture of Dupont/ConAgra, a biotech spinoff from the U.W., and this other time a bunch of kids were sitting around New Year’s Eve, wondering why they couldn’t watch movies on the Internet), only to end up spending a half-decade as an SRM at YouTube and Chrome, where his happiest accomplishment was holding the go/bestpostmortem link for several years.

Presentations

Persistent SRE Antipatterns: Pitfalls On the Road to Creating a Successful SRE Program Like Netflix and Google Session

People aren't just wrong on the internet. Sometimes they bring it back to the office. We're here to debunk the biggest traps we've stepped in, spent good drink money learning about from other people who'd stepped in them, or seen someone who hadn't stepped in them yet propose as good practice. Save yourself some pain. Or just laugh at ours.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

In VM (aka Vicky)‘s nearly 20 years in the tech industry she has been an analyst, programmer, product manager, software engineering manager, director of software engineering, and C-level technical business and open source strategy consultant. Vicky is the winner of the Perl White Camel Award (2014) and the O’Reilly Open Source Award (2016).

Vicky occasionally blogs at http://anonymoushash.vmbrasseur.com, often writes and is a community moderator for opensource.com, and frequently tweets at @vmbrasseur.

Presentations

Find Your Way: Orienteering for Managers Session

Are you managing distributed teams, with very different stakeholders, and/or a mix of hobbyists and paid staff? It probably all seemed easy at first, but the further you travel, the more unfamiliar the terrain starts to appear. Luckily this is not all new ground, many have gotten lost here before and found their way out again. We will provide the map back to productive, happy teams.

Senior Software Engineer
Netflix Playback Licensing Team

Presentations

Event Sourcing at Global Scale: Netflix Downloads Session

The Netflix Download feature allows users to download and play content offline. This feature required a new persistence architecture to maintain the state of user devices and content licenses. Traditional solutions would not meet the demands of a globally distributed and scaled service. We will explore the technical decisions behind the choice of a Cassandra Event Sourcing data store.

Yevgeniy (Jim) Brikman loves programming, writing, speaking, traveling, and lifting heavy things. He does not love talking about himself in the 3rd person. He is the co-founder of Gruntwork, a company that provides DevOps as a Service. He’s also the author of two books published by O’Reilly Media: “Hello, Startup: A Programmer’s Guide to Building Products, Technologies, and Teams” and “Terraform: Up & Running.” Previously, he worked as a software engineer at LinkedIn, TripAdvisor, Cisco Systems, and Thomson Financial and got his BS and Masters at Cornell University. For more info, see ybrikman.com.

Presentations

Infrastructure as Code with Terraform 2-Day Training

Terraform has emerged as a key tool for managing infrastructure as code across a variety of platforms, including AWS, Google Cloud, and Azure. Yevgeniy Brikman gets you up & running quickly with Terraform using real-world examples: deploy servers, DBs, and load balancers on AWS; build immutable infrastructure with Docker and Packer; put it all together in a continuous delivery pipeline.

Brendan Burns is a director of engineering at Microsoft Azure, where he runs the Container Service and Resource Manager teams, and a co-founder of the Kubernetes open source project. Previously, he worked at Google on cloud APIs and web search infrastructure and was a professor of computer science at Union College. Brendan holds a PhD in computer science from the University of Massachusetts Amherst and a BA in computer science and studio art from Williams College.

Presentations

Distributed Systems Patterns from Design to Reality Session

Formal patterns for distributed systems are emerging. The use of these patterns make it significantly easier to design and deploy reliable, scalable distributed systems. However these patterns generally white papers and books. This talk describes transforming these patterns into containers and a custom Kubernetes API which you can use to simply instantiate a distributed system via declarative API.

Tammy Butow is a site reliability engineering manager at Dropbox, where she is the team lead for the Databases and Magic Pocket SRE teams. She enjoys working on infrastructure engineering and is interested in chaos engineering, antifragile systems, automation, Go, and Linux. Previously, Tammy worked in security engineering and product engineering. She is the cofounder of Girl Geek Academy, a global movement to teach 1 million women technical skills by 2025. Girl Geek Academy received support from the Australian prime minister and a grant from the Australian government in 2016 to scale the Miss Makes Code program, which is aimed at teaching algorithms to 5- to 8-year-old girls. An Australian, Tammy currently lives in San Francisco, where she likes to ride bikes, skateboard, snowboard, and surf. She also loves mosh pits, crowd surfing, metal, and hardcore punk.

Presentations

Chaos engineering bootcamp Tutorial

Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Tammy Butow leads a hands-on tutorial on chaos engineering, covering the tools and practices you need to implement chaos engineering in your organization.

Jack Chan is a Sr. Engineering Manager at Shutterfly’s Photos group. He was recently involved heavily with helping the company with a Hybrid Cloud migration solution with Photos-related API services on AWS, paired with a set of core services in a private data center.

Jack has been working in software engineering development closely partnering with operations for quite some time, helping startups scale up to millions of users with Cloud solutions. Prior to that he worked in IT organizations at Adobe, Apple and 3Com.

Presentations

How Shutterfly migrated 10+ billion photos to the Cloud Session

Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explore designs to absorb mountains of metadata, on-premises eCommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a Hybrid Cloud solution with images hosted on-premises and client-facing photos metadata on AWS.

Soon

Presentations

Building real-time data pipeline to support efficient low-latency applications Session

We had to support real-time decision making while operating on petabyte scale data. We modeled our warehouse on quasi-Kappa architecture (rather than Lambda), treating batch processing as a special case of stream processing. In this talk, we'll highlight our architecture, its impact on our systems, tools and processes, and demonstrate how we modeled real-time ads analytics on the platform.

Rob Claire is an engineer on the Visibility team at Pinterest focusing on extracting insight from real-time operational data. Rob has over 17 years of experience in the fields of data engineering, DevOps, and performance tuning, including stints at One King’s Lane, Slide, Ning, and eBay.

Presentations

Drinking from the Firehose: Building a massive scale monitoring stack Session

In this session, you will learn about the the technical challenges in building a monitoring stack that can reliably process millions of events per second. We'll talk about specific technologies, including Spark Streaming, Kafka and HBase, as well as the best practices for managing monitoring data.

Physician, researcher, and educator Richard Cook is presently a research scientist in the Department of Integrated Systems Engineering at the Ohio State University in Columbus, Ohio, and emeritus professor of healthcare systems safety at Sweden’s KTH. Richard is an internationally recognized expert on safety, accidents, and human performance at the sharp end of complex, adaptive systems. His most often cited publication is “Going Solid: A Model of System Dynamics and Consequences for Patient Safety.”

Presentations

David Woods & Richard Cook Keynote

Keynote with David Woods & Richard Cook

Miro Cupak is a senior software engineer at DNAstack, where he develops a leading genomics cloud platform. He is a Java enthusiast with expertise in distributed systems and middleware, and the creator of the largest search and discovery engine of human genomic data. In his spare time, he blogs and contributes to several open-source projects.

Presentations

How we built a global search engine for genetic data Session

Beacon Network is arguably the largest search and discovery engine of human genomic data in the world, and a result of collaboration between developers, researchers and scientists on a global scale. This session describes the architecture and technologies behind the system with focus on the technical decisions that allowed us to scale and disrupt the perception of genetic data.

Bart De Vylder is a data scientist at CoScale. Previously, Bart was active in software engineering and architecture, with a focus on distributed systems. His interests lie in machine learning and building reliable, scalable data processing systems. Bart holds a PhD in artificial intelligence from the Free University of Brussels.

Presentations

A hands-on data science crash course for modeling and predicting the behavior of (large) distributed systems Tutorial

Data science is a hot topic. Bart De Vylder offers a practical introduction that goes beyond the hype, exploring data analysis, visualization, and machine-learning techniques using Python for modeling the behavior of distributed systems. You'll leave with a solid starting point to implement data science techniques in your infrastructure or domain of interest.

Kate is a Systems Engineer in a Delivery Engineering team at REA, the company behind Australia’s No.1 Property Site, http://realestate.com.au. There, she gets to build automation tooling that integrates with tech like Docker, AWS and kubernetes to make building, packaging and deploying applications easy at REA.

She spent the first 6 years of her career as a Developer, before transitioning into a Systems Engineer role for the last year. This transition has given her a keen insight into the delivery issues faced by developers and a passion for writing clean, well tested and maintainable code that is often missing from automation tooling.

Kate is determined to make a difference by introducing more women into the tech community and organises free training events such as Rails Girls and DevOps Girls in her home town of Melbourne, Australia.

Presentations

Automation run rampant Session

With the rise of micro services and DevOps culture, engineers are finding themselves responsible for the all facets of a rapidly growing number of systems. Luckily for you, many of the processes managing these systems can be automated! But where do you begin? How do you know when something is ripe for automation? Is there such as thing as bad automation? And how do you take the first step?

Frederik is a software engineer at Fastly. He has been a developer for nearly 20 years, mostly in C and a bit of Go. He has worked on on telecom systems for most of his carrier: from a fast carrier grade SMTP/DNS/SMS server to a satellite oriented TCP/HTTP proxy. He now leads the HTTP/2 efforts at Fastly, the real-time CDN.

Presentations

A Hands-on Approach to HTTP/2 Development Tutorial

HTTP/2 (or "H2" as the cool kids call it) has been ratified for months and browsers already support it. Do the exciting features that HTTP/2 offer meet expectations? How does HTTP/2 fare in the real world? How is browser behavior changing to accommodate new server-side functionality? How can you get the most of the new protocol everybody’s talking about?

Rob is a systems architect and software engineer who’s never met a programming language he didn’t like. At Intel, he’s a member of the NVML development team, and is the technical lead for pmemkv, a key/value datastore optimized for persistent memory. Prior to Intel, Rob led development of end-user monitoring products at Quest Software and Dell Software, but won’t admit to how many startups it took to get there. Rob lives with his wife, three kids, and snowboard collection in Boulder, Colorado.

Presentations

4 Things I Wish I'd Known Sooner About Persistent Memory Session

On the surface, adapting software to use persistent memory seems obvious. After all, persistent memory is simply fast memory that maintains state when the power goes out, like a SSD. But unlike SSDs, the rise of persistent memory challenges long-held ideas and conventions about how software works. This session will present four key ideas to help focus your persistent memory strategy.

Bret Fisher is a Virginia Beach-based freelance DevOps and Docker consultant, trainer, speaker, and open source volunteer. Bret has been a cloud and data center ops and system administrator for 20 years. Currently, he helps teams Dockerize their apps and systems and improve their speed of deployment, resiliency, metrics, and awareness (all that DevOps-y stuff). Bret is a Docker Captain and Code for America Brigade Captain. He runs several monthly meetups, speaks at conferences, and is obsessed with containerizing any app he sees. (He’ll likely talk your ear off about it next time you meet.) Bret also develops in Node.js, Bash, and general web, usually for open source projects. In his free time, he does CrossFit, surfs a little, geeks out in the awesome local dev community in Virginia Beach, and travels with his wife.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Andrew is a Director of Engineering at Dropbox. He has managed the Production Services, Telemetry, and Storage teams and currently heads up Development Infrastructure at Dropbox. Before Dropbox, he worked at YouTube helping to scale their infrastructure. He previously was at AOL running proxy/cache and video search infrastructure.

Presentations

Best Practices in Data Migrations Session

In 2016 Dropbox migrated 600 petabytes of data from managed cloud storage into its own datacenters. You will hear lessons learned and best practices for data migrations.

Jess Frazelle is a software engineer at Google. She loves all things involving Linux namespaces and cgroups and is probably most well known for running desktop applications in containers. Jessica has been a maintainer of Docker and a contributor to RunC, Kubernetes, Linux, and Golang, among other projects, maintained the AppArmor, Seccomp, and SELinux bits in Docker, and is quite familiar with locking down containers.

Presentations

The Role of Open Source in a Company Keynote

Learn how to be effective at open source from both community and commercial perspectives from technologists who have spent their careers managing the balancing act.

Nikhil Garg is an engineering manager at Quora where he has led various ML teams including Quality, Ads, and ML Platform. He is interested in the intersection of machine learning, distributed systems, and human psychology.

Presentations

Scaling ML Systems Powering Quora's Home Feed Session

Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. In this talk, I will describe the evolution of its architecture and share several lessons from building and scaling this system.

Felix started and sold a startup room.me. He enjoys working on networks and security. He now works for Shopify.

Presentations

Automated Bot Squashing - How X decreased its traffic by 50% by fighting bots across Y stores Session

X hosts Y online stores, some with massive followings that release tiny numbers of sought-after products. Those products get bought and resold for a huge profit: up to ten times the sticker price. And when milliseconds matter, bots buy faster than humans. We cut down up to 50% of our traffic by writing our own bot detection software which doubled as a robust DDoS protection system.

Sebastien Goasguen is a twenty year open source veteran. A member of the Apache Software Foundation, he worked on Apache CloudStack and Libcloud for several years before diving into the container world. He is the founder of Skippbox, a Kubernetes startup acquired by Bitnami where he currently serves as Senior Director of Cloud Technologies. An avid blogger he enjoys spreading the word about new cutting edge technologies . Sebastien is the author of the O’Reilly Docker Cookbook and 60 Recipes for Apache CloudStack

Presentations

Kubernetes Training 2-Day Training

Kubernetes is one of the highest velocity projects on GitHub. Based on 15 years of experience managing containerized applications at Google, Kubernetes is becoming the leading platform to build your distributed applications on.

Dina Goldshtein is a Senior Software Engineer at Aternity (recently acquired by Riverbed). Aternity builds performance monitoring tools that run on millions of PCs and mobile devices. Dina is on the team responsible for the core collection mechanism, which hooks low-level Windows events and collects performance information from a variety of sources. Since starting at Aternity, she worked a lot on boot performance monitoring, identifying bottlenecks in the Windows boot process and on monitoring user-experience on the Web.
Prior to her work at Aternity, Dina worked at BrightSource Energy, where she lead the Software Foundations team, developing software infrastructures used by the entire R&D department. As part of her job she was also responsible for bringing in new technologies, mentoring and improving quality and development processes, department-wide.

Presentations

ETW - Monitor Anything, Anytime, Anywhere Session

ETW is the most important diagnostic tool Windows developers have at their disposal. In this talk, we'll explore the rich and wonderful world of ETW events, which span numerous OS components. You’ll learn how to diagnose complex issues in production systems and explore some ways to automate ETW collection and analysis to build self-diagnosing applications.

Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP and Regional Director, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing — across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.

Presentations

Fast and Safe Production Monitoring of JVM Applications with BPF Magic Session

In this talk, we will see a holistic set of BPF-based tools for monitoring JVM applications on Linux, and revisit a systems performance checklist that includes classics like fileslower, opensnoop, and strace -- all based on the non-invasive, fast, and safe BPF technology.

Linux Performance Monitoring with BPF Tutorial

In this workshop, you will experiment first-hand with a brand new Linux kernel tracing technology -- BPF. You will learn how to apply a collection of BPF-based tools to diagnose high CPU usage, memory leaks, file I/O issues, network requests, and many other complex scenarios -- with a nearly zero overhead.

With a background in video game QA and a degree in theoretical math, I now assure quality for a DevOps tool called VictorOps. When I’m not working, I enjoy playing games with friends far and wide, learning new things and playing outside with my kids.

Presentations

Lessons in Interpersonal Dynamics from Massively Multiplayer Online Games Session

In 2005 a bug World of Warcraft helped epidemiological research in unexpected ways. Several epidemiologists have used massively multiplayer online games to better understand how to effectively model dynamic systems of disease propagation. Today, we can use similar techniques to learn more about how to build and maintain more effective software engineering and DevOps teams.

Michael is a Developer Advocate for OpenShift and Kubernetes at Red Hat where he helps appops to build and operate distributed services. He shares his experience with distributed systems and large-scale data processing through demos, blog posts and public speaking engagements and contributes to open source software such as OpenShift and Kubernetes. Prior to Red Hat, Michael was a Developer Advocate at Mesosphere, Chief Data Engineer at MapR Technologies, and before that he was a Research Fellow at the National University of Ireland, Galway where he researched into large-scale data integration and the Internet of Things and where he gathered experience in advocacy and standardization (World Wide Web Consortium, IETF).

Presentations

Let’s Go! Using the Go programming language for system tasks 2-Day Training

On day 1 of this training you will learn Go from scratch and on the second day understand how to use it for system tasks such as batch file operations, container inspection or access control automation as well as apply the knowledge in your own project.

Lara Hogan is the VP Engineering at Kickstarter and the author of Designing for Performance and Demystifying Public Speaking. She champions performance as a part of the overall user experience, striking a balance between aesthetics and speed and building performance into company culture.

Presentations

Lara Hogan Keynote

Keynote with Lara Hogan

Jonah Horowitz is a Site Reliability Engineer with Stripe. He works with all of the individual engineering teams at Stripe to drive reliability efforts. This includes monitoring, alerting, deployment pipelines and chaos resiliency. Before coming to Stripe he worked at several startups around the Bay Area including: Netflix, Quantcast – a leading ad-tech startup where he grew their network to process over 3 million events per second, Looksmart – a contextual advertising company, and he was on the founding team of Wal-Mart.com (now Walmart Labs) where he built out their software deployment pipelines and their product image management systems.

Presentations

Persistent SRE Antipatterns: Pitfalls On the Road to Creating a Successful SRE Program Like Netflix and Google Session

People aren't just wrong on the internet. Sometimes they bring it back to the office. We're here to debunk the biggest traps we've stepped in, spent good drink money learning about from other people who'd stepped in them, or seen someone who hadn't stepped in them yet propose as good practice. Save yourself some pain. Or just laugh at ours.

Soon

Presentations

Building real-time data pipeline to support efficient low-latency applications Session

We had to support real-time decision making while operating on petabyte scale data. We modeled our warehouse on quasi-Kappa architecture (rather than Lambda), treating batch processing as a special case of stream processing. In this talk, we'll highlight our architecture, its impact on our systems, tools and processes, and demonstrate how we modeled real-time ads analytics on the platform.

Ignat is a security engineer at Cloudflare working mostly on platform and hardware security. Ignat’s interests are cryptography, hacking, and low-level programming. Before CloudFlare, Ignat worked as
senior security engineer for Samsung Electronics’ Mobile Communications Division. His solutions may be found in many older Samsung smart phones and tablets. Ignat started his career as a security researcher in the Ukrainian government’s communications services.

Presentations

Managing server secrets at scale with a vaultless password manager Session

Ever wondered how to quickly and efficiently rollover all of your 1000 servers’ SSH keys? How to securely manage diskless systems? This talk will introduce a simple approach that combines hardware support and little cryptography to help operationalise the management of all the secrets in your cloud.

John has over 17 years of experience working in software engineering as a system administrator, software engineer, technical lead, technical director, development manager and agile coach.

He currently runs the consultancy firm Wise Noodles here he helps organisations solve tough technical problems by untangling their people problems. He is also host of The Agile Path Podcast which creates in-depth audio documentaries on the topics that most affect organisations transitioning to agile ways of working.

Presentations

Pay Attention! Why you should care about psychological safety. Session

Over 3 months John recorded over 60 hours of interviews and spoke to some of the most respected people in the industry to produce an audio documentary that attempts to answer the question “What is safety? And why is it important anyway?”. This highly interactive talk will guide and challenge you through a series of role play and improvisation exercises on a journey to understanding safety.

Presentations

Claire Le Goues Keynote

Keynote by Claire Le Goues

Bryan Liles works on the Cloud Engineering team at Capital One. When not helping a huge back move to the public cloud, he gets to speak at conferences on topics ranging from machine learning to building the next generation of developers. In his free time, Bryan races cars in straight lines and around turns and builds robots and devices.

Presentations

Ben is an Engineer living in Brooklyn, NY.

Presentations

Putting Your First Paper Into Production Session

Machine Learning is as accessible as it has ever been, but it’s not always obvious how to go from a cool paper to serving production traffic. This talk is a distillation of lessons learned solving real problems with Machine Learning at Kickstarter.

Phil is a professional software engineer living in Boston, Massachusetts with extensive experience building and operating distributed systems and continuous delivery pipelines in both the Internet of Things and web services spaces. He is currently building the cloud services for Datawire.io’s resilient microservices framework.

Presentations

Developing Resilient Microservices with Kubernetes and Envoy Tutorial

Microservices is an increasingly popular approach to building cloud-native applications. Dozens of new technologies that streamline adopting microservices development such as Docker, Kubernetes, and Envoy have been released over the past few years. But how do you actually use these technologies together to develop, deploy, and run microservices?

Kelly Looney is the Director of DevOps Consulting at Skytap, and a frequent public speaker at events like DevOps Days, as well as Skytap-hosted panels and roundtable discussions. Kelly joined Skytap after leading a large-scale Agile and DevOps transformation at bwin.party in Vienna, Austria that involved over 900 development and operations personnel. Kelly has varied experience with all types of software development and operations, across many business areas. He has worked with organizations such as Capgemini, Booz Allen Hamilton, and more recently with the Agile and DevOps focused firms ThoughtWorks and at Valtech, consulting with development organizations all over the world. Kelly has personally worked with industry luminaries such as Dick Gabriel, Kent Beck, Jez Humble, and Luke Hohmann. He has a great grasp of both the theory and the practical problems that go alhat go along with the transition to Agile and DevOps.

Presentations

How Do You Eat a Whale? One Byte at a Time. Session

In this talk we will show how an incremental approach to introducing containers into complex, distributed applications results in modernization with less risk and more reward. You’ll learn how to best evaluate which components of your applications are best-suited first for containers, how to get fast feedback, and how to increase your container adoption for more dynamic systems management.

Mark McBride was services engineer lead at Nest Labs and Google, responsible for the development of Nest’s server infrastructure, which makes it possible for Nest customers to connect with their homes from wherever they are. Prior to Nest, Mark was an early developer on Twitter’s streaming API, delivering thousands of messages per second in real time to millions of users. He also managed developer productivity and led the web delivery, developer tools and infrastructure test teams during his time at Twitter. He’s worked with a variety of deploy pipelines, . He led development of some of Twitter’s early service migrations, which grew into a suite of tools used to migrate of millions of requests per second from legacy services to modern replacements. He is currently founder and CEO of Turbine Labs, building products that help engineers ship features more quickly and safely.

Presentations

Customer-centric observability Session

With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Focusing on the customers' viewpoint allows you to deal with a tractable data set, which in turn allows you to evaluate and discuss system performance more effectively.

Mike McGarr is the engineering manager for the Netflix Developer Productivity team as well as the founder of the Developer Experience Silicon Valley meetup. Mike has been developing Java/JVM-based applications for most of his career and been known to dabble in other languages as well. He is passionate about building quality software through automation. Mike can frequently be found talking about agile, continuous delivery, DevOps, or build and test automation. Mike is also a former cohost of the Ship Show podcast. Prior to joining Netflix, Mike was the director of DevOps at Blackboard and the founder of the DC Continuous Delivery meetup.

Presentations

The rise of polyglot at Netflix Session

Netflix has always been a Java shop, since the early DVD days, continuing into our migration to the cloud. This simplified the job for centralized teams. But as the popularity of non-JVM languages rose, centralized teams have begun to rethink their support strategy. This talk is about the early days of our polyglot journey and where we are going.

Tyler McMullen is CTO of Fastly, where he is responsible for the system architecture and leads the company’s technology vision. As part of the founding team, Tyler built the first versions of Fastly’s instant purging system, API, and real-time analytics. Before Fastly, Tyler worked on text analysis and recommendations at Scribd. A self-described technology curmudgeon, Tyler has experience in everything from web design to kernel development and loathes all of it. Especially distributed systems.

Presentations

Building a skyscraper with Legos: The anatomy of a distributed system Session

Much has been written and said about distributed systems of many different sizes, scales, and complexities. Most of the time though, what we talk about are algorithms and techniques. But the practical realities of distributed systems are far from straightforward.

Carin started off as a professional ballet dancer, studied Physics in college, and has been developing software for both the enterprise and entrepreneur ever since. She has a strong background in Ruby and Clojure. Her passions lead her to the intersection of the physical and digital world, combining hardware and software, where she has helped clients develop Home Automation Systems as well as written a control library for the Parrot AR Drone in Clojure. She is highly involved in the community and spoken at many conferences, including keynoting at OSCON and Strange Loop. She helps lead the Cincinnati Functional Programmers and is the author of “Living Clojure”.

Presentations

Unconventional Programming Paradigms for the Future Now Keynote

As our technology advances, our systems are getting more and more complex, reaching the threshold of what we can handle and even comprehend. We need more than tools to keep it under control. We need new ways of thinking.

Terran has worked in the consumer web space for the last decade, including software development, operations, data warehousing, and data science. Most recently, he built up and managed teams at TripAdvisor and Jobcase, with a focus on hiring generalists and teaching them about the specifics. Terran is passionate about continuing education for high-performing professionals and is excited to share some of his lessons and experiences with you.

Presentations

Debugging Complex Systems Session

In this talk, I'll describe an organized process for making observations of a misbehaving complex system, reasoning about possible causes, and isolating the fault. This process is not taught in any college curriculum I know of, but all the successful senior engineers with operational experience that I've talked to use a variant of this process.

Jon Moore is the Chief Software Architect at Comcast Cable, where he focuses on delivering a core set of scalable, performant, robust software components for the company’s varied software product development groups. He specializes in the “art of the possible,” finding ways to coordinate working solutions for complex problems and deliver them on time (even in large enterprises). Jon is equally comfortable leading and managing teams and personally writing production-ready code.

Jon has a passion for software engineering, continuously learning and then teaching colleagues new ways to deliver working, maintainable software with ever-higher quality and ever-shorter delivery times. His current interests include distributed systems, fault tolerance, building healthy and engaging engineering cultures, and Texas Hold’em. Jon received his Ph.D. in Computer and Information Science from the University of Pennsylvania and currently resides in West Philadelphia, although he was neither born there nor raised there and does not spend most of his days on playgrounds.

Presentations

The Art of the Possible Session

How does a large, 50+-year-old company go from purchasing much of its technology and year-long release cycles to building multiple products in-house and daily releases? Jon will trace the changing set of tools, techniques, and attitudes that have powered (and still power) this transformation at Comcast over the last decade, mapping out a path you can follow in your company.

Neha Narula is director of research at the Digital Currency Initiative, a part of the MIT Media Lab where she teaches courses and leads cryptocurrency and blockchain research. While completing her PhD in computer science at MIT, she built fast, scalable databases and secure software systems, and she spoke about these topics at dozens of industry and research conferences.

In a previous life, Narula helped relaunch the news aggregator Digg and was a senior software engineer at Google. There, she designed Blobstore, a system for storing and serving petabytes of immutable data, and worked on Native Client, a system for running native code securely through a browser.

Presentations

Keynote by Neha Narula Keynote

Details to come.

Neha Narula Keynote

Keynote by Neha Narula

Lex has 7 years of experience keeping large services running, including Linden Lab’s Second Life, DeviantArt.com, and Heroku. While originally trained in computer science, he found that he most enjoyed applying his software engineering skills to operations. A veteran of many large incidents, he has strong opinions on incident response, retrospectives, on-call sustainability, and good development and release processes.

Presentations

The Phone Book is On Fire: Lessons from the Dyn DNS DDoS Session

When the DDoS attack crushed Dyn last October, did your DNS fail? Ours sure did. I wondered, “Should I add a new DNS provider?” A: Nope, the TTL kills you. “Should I lower the TTL on my NS records?” Spoiler alert: it won’t help. Over the next month, I searched, asked pros, and even did direct experimentation. Join me to find out what does work and all the crazy details of DNS that I uncovered.

Deb Nicholson works at the intersection of technology and social justice and has over 15 years of nonprofit management experience. Deb got involved in the free software movement about five years ago when she started working for the Free Software Foundation. She is currently the community outreach director for the Open Invention Network—the defensive patent pool built to protect Linux projects. She is also the community manager for GNU MediaGoblin, a brand-new federated media hosting program. In her spare time, Deb serves on the board of OpenHatch, a small nonprofit dedicated to identifying and mentoring new free software contributors, with a particular interest in building a more diverse free software movement.

Presentations

Find Your Way: Orienteering for Managers Session

Are you managing distributed teams, with very different stakeholders, and/or a mix of hobbyists and paid staff? It probably all seemed easy at first, but the further you travel, the more unfamiliar the terrain starts to appear. Luckily this is not all new ground, many have gotten lost here before and found their way out again. We will provide the map back to productive, happy teams.

Michelle Noorali is a software engineer at Deis and a Core Maintainer on the Helm project, a package manager for Kubernetes. Michelle also co-leads SIG-Apps which is the Kubernetes special interest group for running and managing applications on Kubernetes. She is primarily a Go developer but has Ruby roots. Michelle is passionate about developer experiences.

Presentations

Managing Applications on Kubernetes with Helm Session

Kubernetes is a powerful container orchestration platform that has seen unprecedented traction and adoption in the last few years. It can however be tedious and draining to figure out how to actually deploy your applications on Kubernetes if you're new to the space. In this session, you'll learn how to configure, deploy, and manage applications on Kubernetes using an open source tool called Helm.

Alex is a Database Engineer, Apache Cassandra committer, Polyglot programmer, interested in high-performance systems and algorithms.

Presentations

What We Talk About When We Talk About On-Disk Storage Session

Techniques discussed in this talk will help you to figure out how to pick a right database, understand which indexes are best to use, what trade-offs different types of storage bring, how to scale out your data and plan the growth, to find he best additional resources on the subject. In the world of Big and Fast Data, it's important to be fluent in storage and know the right tools for each job.

Guy Podjarny is a cofounder and CEO at Snyk.io focusing on securing open source code. He was previously CTO at Akamai and founder of Blaze.io. He also worked on the first web app firewall and security code analyzer. Guy is a frequent conference speaker, the author of Responsive & Fast, High Performance Images, and the upcoming Securing Third Party Code, and the creator of Mobitest. He also writes on Guypo.com and Medium.

Presentations

Serverless Security: What's Left to Protect? Session

Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. This talk discusses the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe.

Tony Pujals is a Docker Captain and the director of cloud engineering at Appcelerator, where he focuses on improving the process of building, deploying, orchestrating, and monitoring containerized microservices. Tony is fanatical about Docker, Go, Node.js, APIs, microservices, serverless computing, distributed systems, and scalable cloud architecture. He is a co-organizer of the Mountain View Docker meetup.

Presentations

Docker production: Orchestration, security, and beyond Tutorial

Starting where previous Docker workshops leave off, Bret Fisher, Laura Frank, and Tony Pujals dive into the new Swarm mode clustering (services), failover, blue-green deployments, monitoring, logging, troubleshooting, and security, covering the latest built-in features and common third-party tools as they walk you through installing them on your own five-node cloud Swarm cluster.

Ilan is Director of Technical Community at Datadog. Prior to joining Datadog, Ilan spent a number of years leading infrastructure and reliability engineering teams at organizations such as Ooyala and Edmunds.com. In addition to his work at Datadog, he is active in the open-source and DevOps communities, where he is a co-organizer of events such as SCALE, Texas Linux Fest, DevOpsDay LA and DevOpsDays Silicon Valley.

Presentations

Monitoring Containers: Follow the Data Session

Using real-world metrics data from thousands of organizations, I'll share the latest trends in container adoption and use. I'll also share data on what types of applications organizations are running in containers and how to best monitor those containerized applications.

A software hacker and Linux enthusiast, Augustina has a plethora of computer industry experience in roles that include Software Engineering, Web Development, IT Support, System Administration, Release Engineering, and Technical Communications. Besides coding for food, Augustina also contributes to various open source projects. Based out of PDX, Augustina has served as organizer and participates in a variety of user groups and enjoys both attending and speaking at conferences.

Presentations

What is the value of Continuous Integration (now with real data!) Session

While CI has been marketed as being vital for software development, existing research only measures perception of CI value through web-based surveys or limited case studies. I've identified 3 key values CI provides and used public Github data to quantify these values. This talk will show how CI really affects community projects and show you how to measure the value of CI for your own projects.

Anant Rao is an engineering lead at LinkedIn. He has been working on performance optimization and capacity planning for last couple of years and focussed on making the apps go fast and working on infrastructure to prevent performance issues before they make it to production.

Presentations

How LinkedIn Determines the Capacity Limits of its Services using Live Traffic Session

This talk will describe how one can leveraging live production traffic to determine the peak throughput bottlenecks . We will share the methodology on how LinkedIn determines service and resource bottlenecks at scale through a tool called "Redliner" and how you can use your current architecture to do the same.

Tanya Reilly is a Site Reliability Engineer on Google’s bootstrapping team.

Presentations

Have you tried turning it off and turning it on again? Session

Even simple sites may be difficult to recover after a disaster. Backups are not enough! Complex systems are much harder to reason about, and can even be coupled together in ways that make them unrecoverable.

Senior Software Engineer
Netflix Playback Licensing Team
https://www.linkedin.com/in/robert-reta-a17b918

Presentations

Event Sourcing at Global Scale: Netflix Downloads Session

The Netflix Download feature allows users to download and play content offline. This feature required a new persistence architecture to maintain the state of user devices and content licenses. Traditional solutions would not meet the demands of a globally distributed and scaled service. We will explore the technical decisions behind the choice of a Cassandra Event Sourcing data store.

Liz Rice is Technology Evangelist with container security specialists Aqua Security. Prior to that she was co-founder of Microscaling Systems, developing a real-time container scaling engine, and MicroBadger, the popular tool for exploring image metadata. Liz has a wealth of software development, team, and product management experience from her years spent working on network protocols and distributed systems and in digital technology sectors such as VOD, music, and VoIP. When not building startups and writing code, Liz loves riding bikes in places with better weather than her native London.

Presentations

Your (container) secret's safe with me Session

In a containerized deployment, how do you safely pass secrets - like passwords and certificates - between containers without compromising their safety? If orchestration means a container can run on any machine in the cluster, how do you minimize who knows your secrets? This talk will explore and demonstrate the risks, and discuss best practices for keeping your secrets safe.

Presentations

Nick Rockwell Keynote

Keynote with Nick Rockwell

For over a decade, Andrew has been getting the chocolate of dev in the peanut butter of ops, and vice versa. Currently Senior Backend Engineer – Video Systems at Vimeo, he has also been a developer at Shutterstock, spoken at (and organized streaming video for) prominent Perl conferences, and contributed some cool stuff to CPAN.

Presentations

Load Balancing, Consistent Hashing, and Locality Session

Serving a billion requests per day with a dynamic video packager makes unique demands on a load balancer. This talk will discuss a new consistent hashing algorithm developed by Google researchers helped us improve cache locality and optimize our delivery, and how we made a contribution to open-source software in the process.

Ed Rousseau is a principal DevOps engineer at CureForward. He has a passion for open source, simplicity, and QA. Ed has worked in engineering at both large companies (Red Hat) and cloud-native startups (SimpliSafe, Cure Forward).

Presentations

Developing Resilient Microservices with Kubernetes and Envoy Tutorial

Microservices is an increasingly popular approach to building cloud-native applications. Dozens of new technologies that streamline adopting microservices development such as Docker, Kubernetes, and Envoy have been released over the past few years. But how do you actually use these technologies together to develop, deploy, and run microservices?

Alex Rukletsov is an Apache committer and Mesos PMC member at Mesosphere. Prior to that Alex was segmenting medical images and investigating behaviour of human vessels in several German research institutes. His areas of interests include distributed systems, object recognition, probabilistic and heuristic algorithms.

Presentations

Health Checking: a not-so-trivial task in the distributed containerized world Session

Application health checking and probing have existed since the dawn of computer science. Usually seen as a trivial task, health checking becomes more involved when applied to distributed cloud-native apps. In this talk we will explore the challenges and perils of modern health checking, and will share some lessons learned during the revamp of the Apache Mesos health checks subsystem.

Passionate about human beings and their means of communication, Cynthia Savard Saucier has always sought a deeper understanding of how people think, interact, and connect. Communication has thus been a key ingredient in her technological endeavors. A Université de Montréal graduate in industrial design, she was awarded the 2010 RAÉDIUM prize for Chouette!, a technological communication platform to strengthen intergenerational relations. Her slick solution allows preschool kids and seniors to share the same interface, using the poetry of light and play.

Since she joined Shopify as a project lead and user experience designer, Cynthia has sprinkled her creativity in many important projects, notably working on web and mobile interfaces. In a field that is not always fully understood, she excels by creating smart, emotional connections between companies and users. Cynthia has a knack for strategic design, ergonomics, and problem solving. Her wide range of experience has brought her broad recognition as a leading expert on multiplatform interface design.

In addition to her day job, Cynthia mentors startups and is regularly invited to speak at events around the world, where her playful approach both startles and charms. In her conference presentations, she shares her passion for her point of view: user-centered design is a reality, not a utopian methodology.

Presentations

The Impact of Design: How Design Influences Outcomes Keynote

Keynote by Cynthia Savard Saucier

Rafael Schloming is the CTO of datawire.io, a coauthor of the Advanced Message Queuing Protocol (AMQP) specification, and primary architect of the open source Apache Qpid Proton project. Previously, Rafael was a principal software engineer at Red Hat, working on messaging technologies.

Presentations

Developing Resilient Microservices with Kubernetes and Envoy Tutorial

Microservices is an increasingly popular approach to building cloud-native applications. Dozens of new technologies that streamline adopting microservices development such as Docker, Kubernetes, and Envoy have been released over the past few years. But how do you actually use these technologies together to develop, deploy, and run microservices?

Baron Schwartz is founder and CEO of VividCortex, the best way to see what your production database servers are doing. He is the lead author of High Performance MySQL and a variety of open-source software.

Presentations

Instrumenting systems for arbitrary observability Session

Observability (or lack thereof) is a fundamental property of systems, like testability and maintainability. But what does observable code *look* like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? In this talk you'll learn a small, pragmatic set of things you can instrument that'll provide high leverage and utility later.

Yuri Shkuro is a staff engineer at Uber Technologies, working on distributed tracing, reliability, and performance. Yuri is the coauthor of the OpenTracing standard, a project within the Linux Foundation’s Cloud Native Computing Foundation (CNCF). Previously, Yuri worked in the financial industry developing derivatives trading and risk management systems. He holds a PhD in machine learning from the University of Maryland.

Presentations

Before joining AWS, Julien served for 10 years as CTO/VP Engineering in top-tier web startups. Thus, he’s particularly interested in all things architecture, deployment, performance, scalability and data. As a Principal Technical Evangelist, Julien speaks very frequently at conferences and technical workshops, where he meets developers and enterprises to help them bring their ideas to life thanks to the Amazon Web Services infrastructure.

Presentations

FPGAs in the Cloud? Session

AWS has recently introduced FPGA-powered instances (aka f1 instances). In this technical talk, we’ll show how you can build an FPGA-enabled application, from design to simulation to synthesis to execution on an f1 instance. Not your typical cloud computing demo, then!

Ines Sombra is a director of engineering at Fastly, where she spends her time helping the web go faster. Ines holds an MS in computology with an emphasis on cheesy ’80s rock ballads. She has a fondness for steak, fernet, and a pug named Gordo. In a previous life, Ines was a data engineer.

Presentations

Tuesday Opening Welcome Keynote

Tuesday Opening Welcome

Wednesday Opening Welcome Keynote

Wednesday Opening Welcome

Cindy Sridharan is a developer at imgix based in San Francisco, with a strong interest in operations, systems programming, and infrastructure. She organizes the SF Prometheus meetup.

Presentations

Monitoring in the time of Cloud Native Session

As the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools of the past prove to be grossly insufficient. Fortunately, the state of monitoring has evolved as well to meet these new demands, but brings its own set of technical and organizational challenges. This talk aims to provide an honest overview of challenges and tradeoffs.

I manage the stream platform, data warehousing and ML training platform efforts at Pinterest.

I am an engineer at heart and have worked as an IC for more than 12 years building distributed systems and cloud platforms at various large companies.

Presentations

Building real-time data pipeline to support efficient low-latency applications Session

We had to support real-time decision making while operating on petabyte scale data. We modeled our warehouse on quasi-Kappa architecture (rather than Lambda), treating batch processing as a special case of stream processing. In this talk, we'll highlight our architecture, its impact on our systems, tools and processes, and demonstrate how we modeled real-time ads analytics on the platform.

M​ary Treseler is ​a content director at O’Reilly Media, ​where she leads an editorial team that covers a wide range of topics from design to DevOps. Mary is the cochair of the O’Reilly Design Conference and Velocity Conference. She has been working on technical content for 25 years, acquiring, developing, and content in areas such as programming, software engineering, and product design. A Boston native, Mary lives​ oceanside​ ​in Padanaram, MA.

Presentations

Tuesday Opening Welcome Keynote

Tuesday Opening Welcome

Wednesday Opening Welcome Keynote

Wednesday Opening Welcome

James Turnbull is the CTO of Empatico. A long-time member of the open source community, James is the author of nine technical books about open source software: The Terraform Book, The Art of Monitoring, The Logstash Book, The Docker Book, Pro Puppet, Pulling Strings with Puppet, Pro Linux System Administration, Pro Nagios 2.0, and Hardening Linux. He was formerly CTO at Kickstarter and an advisor at Docker. James likes food, wine, books, photography, and cats. He is not overly keen on long walks on the beach and holding hands.

Presentations

Tuesday Opening Welcome Keynote

Tuesday Opening Welcome

Wednesday Opening Welcome Keynote

Wednesday Opening Welcome

Sarah Wells is currently leading work at the Financial Times on building a semantic publishing platform, making it easy to discover and access all the FT’s published content via APIs in a common and flexible format.

Sarah has been a developer for 15 years, working across consultancy, financial services, and media. She is more dev than ops, but definitely shifting. Her recent focus has been on Go, microservices, containerisation, devops, and how to influence teams to do the right things.

Presentations

Operating microservices: everything is at scale Session

Most people think about microservices as a solution for scale. That may be the case, but operating them is definitely a scale challenge. When you have 100+ services, everything needs to be automated, or else you'll spend two days updating jenkins build pipelines, and get woken up every night by false alarms caused by network blips. I'll tell you how to handle that operational challenge!

David Woods is a professor at the Ohio State University, where he is the lead for the Initiative on Complexity in Natural, Social, and Engineered Systems and the codirector of Ohio State University’s Cognitive Systems Engineering Laboratory. David is a former president of both the Resilience Engineering Association and the Human Factors and Ergonomics Society.

Presentations

David Woods & Richard Cook Keynote

Keynote with David Woods & Richard Cook

Susie is a senior software engineer at LinkedIn, currently focus on scalability and capacity analysis. She also worked on mobile applications and automations in her early career.

Presentations

How LinkedIn Determines the Capacity Limits of its Services using Live Traffic Session

This talk will describe how one can leveraging live production traffic to determine the peak throughput bottlenecks . We will share the methodology on how LinkedIn determines service and resource bottlenecks at scale through a tool called "Redliner" and how you can use your current architecture to do the same.

Zhenzhong Xu is currently a Software Engineer working on highly scalable and resilient streaming data infrastructure at Netflix. Previously, he was a core contributor to Microsoft Azure datacenter operating system reconciliation management & resiliency functionalities. He is passionate about anything related to real time data systems & large scale distributed system.

Presentations

Running a Massively Parallel Stream Processing System at Netflix Session

Keystone is the critical piece of Netflix backend data infrastructure to ensure massive data movements and real-time event processing. The talk will deep dive into architecture and underlying stream processing engines. The talk will provide insights & proven paths on how we achieved multi-tenancy, scalability and resilience in a cloud native complex distributed system environment.