Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

Sessions

Learn new skills and best practices from expert speakers. Velocity sessions take place Tuesday, October 3 and Wednesday, October 4.

Tuesday, October 3

11:35am12:15pm Tuesday, October 3, 2017
Location: Beekman
Alexander Rukletsov (Mesosphere)
Average rating: **...
(2.88, 8 ratings)
Application health checking and probing have existed since the dawn of computer science. Usually seen as a trivial task, health checking becomes more involved when applied to distributed cloud-native apps. Alexander Rukletsov discusses the challenges and perils of modern health checking and shares lessons learned during the revamp of the Apache Mesos health checks subsystem. Read more.
11:35am12:15pm Tuesday, October 3, 2017
Location: Regent
Joseph Breuer (Netflix), Robert Reta (Netflix)
Average rating: *****
(5.00, 1 rating)
The Netflix download feature allows users to download content for offline playback. Implementing this feature required a new persistence architecture to maintain the state of user devices and content licenses. Joseph Breuer and Robert Reta explore the technical decisions behind the choice of a Cassandra event sourcing data store. Read more.
11:35am12:15pm Tuesday, October 3, 2017
Location: Gramercy
Bryan Liles (Heptio)
Average rating: *****
(5.00, 1 rating)
Our industry is continuing to mature, and there is a path for you. Bryan Liles explores paths for starting a career in systems engineering, ideas on where we could go in the future, and how words, technology, and empathy impact the people and projects we interact with every day. Read more.
11:35am12:15pm Tuesday, October 3, 2017
Location: Nassau
Andrew Betts (Fastly)
Average rating: *****
(5.00, 2 ratings)
Most people working with CDN caches know about the Vary header, but few properly understand what it really does. With the advent of the Key header, new patterns for varying cache content will soon emerge. Andrew Betts shares common and advanced use cases for Vary, such as language, A/B testing, compression, and service worker support, and outlines potential changes to consider when Key arrives. Read more.
11:35am12:15pm Tuesday, October 3, 2017
Location: Grand Ballroom West
VM Brasseur (Juniper Networks), Deb Nicholson (Software Freedom Conservancy)
Average rating: *****
(5.00, 1 rating)
Are you managing distributed teams with very different stakeholders—perhaps even a mix of hobbyists and paid staff? It probably seemed easy at first, but the further you travel, the more unfamiliar the terrain appears. Luckily, this is not new ground. Many have gotten lost here before and found their way out again. VM Brasseur and Deb Nicholson share a map to productive, happy teams. Read more.
11:35am12:15pm Tuesday, October 3, 2017
Location: Murray Hill East B
Duncan McAllister (Akamai Technologies), Akshay Ranganath (Akamai Technologies)
CDN automation and pipeline integration can often be a daunting task. Too often these services are integrated late in the delivery process, traditionally in the QA or production deployment phases. Duncan McCallister and Akshay Ranganath share approaches that account for CDNs much earlier in the development lifecycle and highlight specific considerations around CI/CD pipeline integration. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Beekman
Liz Rice (Aqua Security)
Average rating: ****.
(4.67, 3 ratings)
In a containerized deployment, how do you safely pass secrets like passwords and certificates between containers without compromising their safety? If orchestration means a container can run on any machine in the cluster, how do you minimize who knows your secrets? Liz Rice explores the risks and shares best practices for keeping your secrets safe. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Regent
Ben Linsay (Bumpers)
Average rating: **...
(2.50, 2 ratings)
Machine learning is as accessible as it has ever been, but it’s not always obvious how to go from a cool paper to serving production traffic. Ben Linsay helps you get started putting your paper into production, sharing lessons learned solving real problems with machine learning at Kickstarter. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Gramercy
Guy Podjarny (Snyk)
Average rating: *****
(5.00, 1 rating)
Serverless means handing off server management to the cloud platforms—along with their security risks. With the “pros” ensuring our servers are patched, what’s left for application owners to protect? As it turns out, quite a lot. Guy Podjarny explores the aspects of security serverless doesn’t solve, the problems it could make worse, and the tools and practices you can use to keep yourself safe. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Nassau
Andrew Rodland (Vimeo)
Average rating: ****.
(4.33, 3 ratings)
Serving a billion requests per day with a dynamic video packager makes unique demands on a load balancer. Andrew Rodland shares a new consistent hashing algorithm developed by Google researchers that helped improve cache locality and optimize delivery—and made a contribution to open source software in the process. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Grand Ballroom West
Margaret Gourlay (VictorOps)
Average rating: ***..
(3.80, 5 ratings)
In 2005, a World of Warcraft bug helped epidemiological research in unexpected ways. Margaret Gourlay draws on this research to share insight into what works and what doesn’t for functional teams and explains how using these ideas has helped VictorOps strategically grow its engineering team in unexpected ways. Read more.
1:30pm2:10pm Tuesday, October 3, 2017
Location: Murray Hill East B
Oded Keret (Micro Focus)
Oded Keret shares HPE's performance testing experience, the challenges the company overcame, and the lessons learned along the way. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Beekman
Michelle Noorali (Microsoft Azure)
Average rating: **...
(2.33, 3 ratings)
Container orchestration platform Kubernetes has seen unprecedented traction and adoption in the last few years. However, it can be difficult to figure out how to actually deploy your applications on Kubernetes if you're new to the space. Michelle Noorali walks you through configuring, deploying, and managing applications on Kubernetes using an open source tool called Helm. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Regent
Leif Walsh (Two Sigma)
Average rating: **...
(2.00, 1 rating)
Leif Walsh offers an overview of Flint, Two Sigma's open source time series extension to Spark, explains how it fits in with the Spark programming model, and lays out the roadmap for the future of pandas, PySpark, and Flint. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Gramercy
Jonathan Moore (Comcast Cable)
Average rating: *****
(5.00, 1 rating)
How does a large 50-year-old company go from purchasing much of its technology and working with yearlong release cycles to building multiple products in-house and releasing daily? Jon Moore traces the changing set of tools, techniques, and attitudes that have powered (and still power) this transformation at Comcast over the last decade, mapping out a path you can follow in your own company. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Nassau
Felix Glaser (Shopify)
Average rating: ****.
(4.67, 3 ratings)
During flash sales, when milliseconds matter, bots buy faster than humans. These bots created a constant load on Shopify’s infrastructure and SREs—until the company decided to create an automated system to detect and block nearly all bot traffic on its load balancers. Felix Glaser offers an overview of this system and shares the challenges Shopify faced differentiating between bots and humans. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Grand Ballroom West
Lisa Phillips (Fastly)
Lisa Phillips shares strategies for overcoming individual and organizational management challenges in a globally diverse environment and explores people management challenges and methods to work with the grumpiest admin. Read more.
2:25pm3:05pm Tuesday, October 3, 2017
Location: Murray Hill East B
Mike Strickland (Intel Corporation)
Microsoft has widely deployed field-programmable gate arrays (FPGAs) for accelerating search, networking, and machine learning—with a little help from the company’s software expertise and its FPGA programmers. Mike Strickland explains how a single FPGA can deliver significant acceleration for multiple workloads. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Beekman
Ilan Rabinovitch (Datadog)
Average rating: **...
(2.33, 6 ratings)
Drawing on real-world metrics data from thousands of organizations, Ilan Rabinovitch shares the latest trends in container adoption and use, explores the types of applications organizations are running in containers, and explains how to best monitor these containerized applications. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Regent
Rob Dickinson (resurface.io)
On the surface, adapting software to use persistent memory seems obvious. After all, persistent memory is simply fast memory that maintains state when the power goes out, like an SSD. But unlike SSDs, persistent memory challenges long-held ideas and conventions about how software works. Rob Dickinson outlines four key ideas that will help focus your persistent memory strategy. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Gramercy
Sarah Wells (Financial Times)
Average rating: ****.
(4.86, 7 ratings)
Most people think about microservices as a solution for scale. That may be the case, but operating them is definitely a scale challenge. Sarah Wells explains why, when you have 100+ services, everything needs to be automated, or else you'll spend two days updating Jenkins build pipelines or be woken up every night by false alarms caused by network blips. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Nassau
Kristopher Beevers discusses the evolution of the edge delivery architecture of major DNS service provider NS1, from its earliest prototypes to the large, heavily automated global network it operates today, and the many operational lessons learned along the way. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Grand Ballroom West
John Le Drew (Wise Noodles)
Average rating: *****
(5.00, 1 rating)
John Le Drew draws on the hours of interviews he conducted with some of the most respected people in the industry for the Agile Path podcast to explain what psychological safety is and why you should care about it, as he walks you through a series of highly interactive role-playing and improvisation exercises. Read more.
3:50pm4:30pm Tuesday, October 3, 2017
Location: Murray Hill East B
Arshan Dabirsiaghi (Contrast Security)
Average rating: ****.
(4.50, 2 ratings)
Arshan Dabirsiaghi explains what Contrast Security learned from the Struts 2 exploit and details how to stop the next attack against your production apps. Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Beekman
Kelly Looney (Skytap)
Average rating: **...
(2.50, 2 ratings)
Kelly Looney shares an incremental approach to introducing containers into complex, distributed applications—resulting in modernization with less risk and more reward. You’ll learn how to evaluate which components of your applications are best suited for containers, how to experiment safely and get fast feedback, and how to increase and scale your container adoption. Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Regent
Andrew Turley (Wallaroo Labs)
The cost of coordinating access to information in a distributed system increases as the system scales up. Andrew Turley offers an overview of the entity-based approach to addressing this issue and explains how it has influenced the design of Wallaroo, a platform for building high-performance, event-driven systems. Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Gramercy
Blake Bisset (Independent), Jonah Horowitz (Stripe)
Average rating: ****.
(4.50, 2 ratings)
People aren't just wrong on the internet. Sometimes they bring it back to the office. Blake Bisset and Jonah Horowitz share stories about anti-patterns in monitoring, incident response, configuration management, and more and explain how Google and Netflix view the role of the SRE (and how it differs from the traditional system administrator role). Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Nassau
Susie Xia (LinkedIn), anant Rao (LinkedIn)
Average rating: ****.
(4.67, 6 ratings)
Susie Xia and Anant Rao explain how LinkedIn leverages live production traffic to determine service and resource bottlenecks at scale with a tool called Redliner and how you can use your current architecture to do the same. Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Grand Ballroom West
Kellan Elliott-McCrea (Blink Health)
Average rating: *****
(5.00, 1 rating)
Kellan Elliott-McCrea explains how to lead technical decision making for high-performing teams. Read more.
4:45pm5:25pm Tuesday, October 3, 2017
Location: Murray Hill East B
Matt Cutts (United States Digital Service (USDS))
Average rating: *****
(5.00, 1 rating)
When the Healthcare.gov website failed, it was a turning point and an opportunity. In the last few years, hundreds of engineers, designers, and product managers have signed up to do tours of service in government. Matt Cutts explores what happens when technology and government mix. A lot of interesting things, it turns out. Read more.

Wednesday, October 4

11:35am12:15pm Wednesday, October 4, 2017
Location: Beekman
Average rating: ***..
(3.00, 3 ratings)
As the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools prove to be grossly insufficient. Fortunately, the state of monitoring has evolved to meet these new demands, but it brings its own set of technical and organizational challenges. Cindy Sridharan offers an honest overview of monitoring challenges and trade-offs. Read more.
11:35am12:15pm Wednesday, October 4, 2017
Location: Regent
Jeffrey Valeo (Grubhub)
Average rating: **...
(2.00, 1 rating)
Load testing is a complicated and time-consuming process in the world of monolithic applications. And with the move to distributed systems (microservices), it is even more complicated. Jeffrey Valeo draws on real-world examples to share tips on how to effectively load-test distributed systems. Read more.
11:35am12:15pm Wednesday, October 4, 2017
Location: Gramercy
Mike McGarr (Netflix)
Average rating: *****
(5.00, 2 ratings)
Netflix has always been a Java shop, from its early DVD days to its migration to the cloud. This simplified the job for centralized teams, but as the popularity of non-JVM languages rose, these teams have begun to rethink their support strategy. Mike McGarr discusses the early days of Netflix's polyglot journey and where the company is going in the future. Read more.
11:35am12:15pm Wednesday, October 4, 2017
Location: Nassau
Jack Chan (Shutterfly)
Average rating: **...
(2.00, 1 rating)
Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos' metadata on AWS. Read more.
11:35am12:15pm Wednesday, October 4, 2017
Location: Grand Ballroom West
Robert Claire (Pinterest)
Average rating: ****.
(4.00, 1 rating)
Rob Claire explores the the technical challenges and lessons learned in building a monitoring stack that can reliably process millions of events per second, covering specific technologies—including Spark Streaming, Kafka, and HBase—and best practices for managing and monitoring data. Read more.
11:35am12:15pm Wednesday, October 4, 2017
Location: Murray Hill East B
Michal Skiba (Intel Corporation)
Field-programmable gate arrays (FPGAs)—customizable digital circuits capable of processing large amounts of data incredibly quickly—have traditionally required deep expertise to program. Michal Skiba explains how Intel is helping developers accelerate their cloud applications through a software stack that greatly simplifies the use and management of FPGAs. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Beekman
Mark McBride (Turbine Labs)
Average rating: ****.
(4.00, 1 rating)
With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Regent
Terran Melconian (Air Network Simulation and Analysis)
Average rating: ***..
(3.50, 4 ratings)
Terran Melconian explores an organized process for observing a misbehaving complex system, reasoning about possible causes, and isolating the fault. While it is not generally taught, all the successful senior engineers with operational experience Terran has talked to use a variant of this process. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Gramercy
Kate Deutscher (GreenSync)
Average rating: ****.
(4.67, 3 ratings)
Kate Deutscher explores common pitfalls to automating software delivery and explains how to find the processes in your delivery pipeline that can benefit the most from automation, focusing on three patterns commonly seen in automation tooling, backed by real-world case studies of when this pattern has worked well—and when it has ended in rampant failure. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Nassau
Andrew Fong (Dropbox)
In 2016, Dropbox migrated 600 petabytes of data from managed cloud storage into its own data centers. Andrew Fong shares lessons and best practices for data migrations learned from this experience. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Grand Ballroom West
Zhenzhong Xu (Netflix)
Average rating: *****
(5.00, 2 ratings)
Keystone, a critical piece of Netflix's backend data infrastructure, ensures massive data movements and real-time event processing. Zhenzhong Xu leads a deep dive into Keystone's architecture and underlying stream processing engines, sharing insights and proven paths on how the company achieves multitenancy, scalability, and resilience in a complex cloud-native distributed system environment. Read more.
1:30pm2:10pm Wednesday, October 4, 2017
Location: Murray Hill East B
Kelsey Hightower (Google)
Average rating: *****
(5.00, 3 ratings)
Kubernetes has become the go-to open source framework for managing containers and building application platforms that scale from 1 to 5,000 machines. Kelsey Hightower offers an overview of the Kubernetes 1.8 release and explains why this trend will continue. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Beekman
Baron Schwartz (VividCortex)
Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? Baron Schwartz outlines the most useful things to know about observability in systems in production. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Regent
Tyler McMullen (Fastly)
Average rating: ***..
(3.50, 2 ratings)
Many words have been spilled about distributed systems. Most of the time though, what we talk about are algorithms and techniques. But the practical realities of distributed systems are far from straightforward. Tyler McMullen outlines a new approach built to perform very high volumes of health checks across a cluster of machines for reliability and scalability. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Gramercy
Tanya Reilly (Squarespace)
Average rating: *****
(5.00, 5 ratings)
Tanya Reilly explores the parts of disaster recovery you might be less prepared for, covering why the best laid fallback plans tend to go wrong and why you should start deliberately managing your dependencies long before you think you need to. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Nassau
Julien Simon (AWS)
FPGAs have become a hot topic in the IT industry, thanks to the unprecedented computing power that they bring to demanding HPC applications, and AWS recently introduced FPGA-powered instances (aka F1 instances) to make the process simpler and quicker. Julien Simon walks you through building an FPGA-enabled application, from design to simulation to synthesis to execution on an F1 instance. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Grand Ballroom West
Swaminathan Sundaramurthy (Salesforce Inc), Mark Cho (Pinterest)
Pinterest has to support real-time decision making while operating on petabyte-scale data. Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest's real-time data pipeline (modeled on quasi-Kappa architecture), its impact on the company's systems, and tools and processes used and demonstrate how Pinterest models real-time ads analytics on the platform. Read more.
2:25pm3:05pm Wednesday, October 4, 2017
Location: Murray Hill East B
David Belson (Oracle+Dyn)
Average rating: *....
(1.00, 1 rating)
Although we often think of “breaking the internet” in the context of a website that couldn't handle the traffic associated with a piece of viral media content, behind the scenes, critical pieces of internet infrastructure break on a regular basis. David Belson dives into some of these issues and explains how you can avoid being impacted by them. Read more.
3:50pm4:30pm Wednesday, October 4, 2017
Location: Beekman
Dina Goldshtein (Riverbed)
Event Tracing for Windows (ETW) is the most important diagnostic tool Windows developers have at their disposal. Dina Goldshtein explores the rich and wonderful world of ETW events, which span numerous OS components. You’ll learn how to diagnose complex issues in production systems and discover ways to automate ETW collection and analysis to build self-diagnosing applications. Read more.
3:50pm4:30pm Wednesday, October 4, 2017
Location: Regent
Brendan Burns (Microsoft)
Average rating: ***..
(3.00, 3 ratings)
Formal patterns for distributed systems make it significantly easier to design and deploy reliable, scalable distributed systems. Brendan Burns explains how to transform these patterns into containers and a custom Kubernetes API, which you can use to simply instantiate a distributed system via declarative API. Read more.
3:50pm4:30pm Wednesday, October 4, 2017
Location: Gramercy
Lex Neva (Fastly)
Average rating: ****.
(4.50, 4 ratings)
When the DDoS attack crushed Dyn last October, did your DNS fail? Heroku's sure did. In response, Lex Neva deep dove into everything DNS to learn how to implement resilient DNS properly—reading RFCs, asking questions of pros, and performing real-world experiments when no one knew the answers. Join Lex to find out what does work and all the crazy details of DNS that he uncovered. Read more.
3:50pm4:30pm Wednesday, October 4, 2017
Location: Nassau
Ignat Korchagin (Cloudflare)
Ever wondered how to quickly and efficiently rollover all of your servers’ SSH keys or how to securely manage diskless systems? Ignat Korchagin outlines a simple approach that combines hardware support and a little cryptography to help operationalize the management of all the secrets in your cloud. Read more.
3:50pm4:30pm Wednesday, October 4, 2017
Location: Grand Ballroom West
Kevin Beck (New Relic)
Average rating: **...
(2.50, 2 ratings)
New Relic customers send monitoring data to New Relic servers every minute—a continuous firehose of data. Drawing on his experience at New Relic, Kevin Beck shares best practices for building a streaming service based on Apache Kafka, self-monitoring for reliability and fault tolerance, and building a DevOps culture that anticipates and prevents outages. Read more.
4:45pm5:25pm Wednesday, October 4, 2017
Location: Beekman
Sasha Goldshtein (Sela Group)
Sasha Goldshtein explores a holistic set of BPF-based tools for monitoring JVM applications on Linux and outlines a systems performance checklist that includes classics like fileslower, opensnoop, and strace—all based on the noninvasive, fast, and safe BPF technology. Read more.
4:45pm5:25pm Wednesday, October 4, 2017
Location: Regent
Karthik Kirupanithi (Amazon Web Services)
Voice UIs like Amazon's Alexa can make systems management simple, intuitive, and delightful. The virtual private assistant feel of a VUI, coupled with the abstraction that voice commands bring, break the tedium of management tasks. Karthik Kirupanithi demonstrates how to put together an Alexa skill that can perform tasks using the EC2 Systems Manager. Read more.
4:45pm5:25pm Wednesday, October 4, 2017
Location: Gramercy
Nikhil Garg (Quora), Neeraj Agrawal (Quora)
Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. Nikhil Garg and Neeraj Agrawal describe the evolution of the home feed's architecture and share several lessons from building and scaling this system. Read more.
4:45pm5:25pm Wednesday, October 4, 2017
Location: Nassau
Oleksandr Petrov (Independent)
In the world of big and fast data, it's important to be fluent in storage and know the right tools for each job. Alex Petrov shares techniques for picking the right database and indexes, understanding the trade-offs different types of storage bring, scaling out your data and planning its growth, and finding the best resources on the subject. Read more.
4:45pm5:25pm Wednesday, October 4, 2017
Location: Grand Ballroom West
Vinu Charanya (Twitter)
Average rating: ****.
(4.00, 1 rating)
Twitter is powered by thousands of microservices running on an internal cloud platform, which offer compute, storage, messaging, monitoring, etc. as a service. Vinu Charanya explains how she and her team are building a system that captures, defines, provisions, meters, and charges infrastructure resources, redefining how systems are built atop Twitter infrastructure. Read more.