Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY
 

Schedule

< Filters

No Results Found

Clear all filters

Close

Filters

      Clear filters
      Beekman
      11:35am Monitoring in the time of cloud native Cindy Sridharan (--)
      1:30pm Customer-centric observability Mark McBride (Turbine Labs)
      2:25pm Instrumenting systems for arbitrary observability Baron Schwartz (VividCortex)
      3:50pm ETW: Monitor anything, anytime, anywhere Dina Goldshtein (Riverbed)
      Regent
      1:30pm Debugging complex systems Terran Melconian (Air Network Simulation and Analysis)
      3:50pm Distributed systems patterns: From design to reality Brendan Burns (Microsoft)
      4:45pm Systems management with a voice UI using Amazon Alexa Karthik Kirupanithi (Amazon Web Services)
      Gramercy
      11:35am The rise of polyglot programming at Netflix Mike McGarr (Netflix)
      1:30pm Automation run rampant Kate Deutscher (GreenSync)
      2:25pm Have you tried turning it off and turning it on again? Tanya Reilly (Squarespace)
      4:45pm Scaling the machine learning systems powering Quora's home feed Nikhil Garg (Quora), Neeraj Agrawal (Quora)
      Nassau
      1:30pm Best practices in data migrations Andrew Fong (Dropbox)
      2:25pm FPGAs in the cloud? Julien Simon (AWS)
      4:45pm What we talk about when we talk about on-disk storage Oleksandr Petrov (Independent)
      Grand Ballroom West
      Grand Ballroom West
      9:00am Wednesday opening welcome Mary Treseler (O'Reilly Media), James Turnbull (Glitch), Ines Sombra (Fastly)
      9:05am What if serverless was real? Nick Rockwell (The New York Times)
      9:25am Above the line, below the line: A preview of the SNAFUcatchers Stella Report David Woods (Ohio State University SNAFUcatchers), Richard Cook (Ohio State University SNAFUcatchers)
      9:50am The role of open source in a company Jessica Frazelle (Microsoft), Dino Dai Zovi (Capsule8)
      10:10am How real is real-user measurement? (sponsored by Catchpoint) Robert Castley (Catchpoint Systems)
      10:15am Mentorship and sponsorship Lara Hogan (Wherewithall)
      2:25pm Genji: A framework for building resilient near-real-time data pipelines Swaminathan Sundaramurthy (Salesforce Inc), Mark Cho (Pinterest)
      Murray Hill East B
      3:50pm
      4:45pm
      12:15pm Wednesday lunch and Birds of a Feather sessions | Room: Americas Halls 2
      8:15am Wednesday Speed Networking | Room: 3rd Floor Foyer
      8:00am Coffee | Room: Exhibit Hall, Americas Hall I
      10:45am Break | Room: Exhibit Hall, Americas Hall I
      3:05pm Break | Room: Exhibit Hall, Americas Hall I
      11:35am-12:15pm (40m) Monitoring, Tracing and Metrics
      Monitoring in the time of cloud native
      Cindy Sridharan (--)
      As the systems we build become more distributed and (in the case of containerization) ephemeral, traditional monitoring tools prove to be grossly insufficient. Fortunately, the state of monitoring has evolved to meet these new demands, but it brings its own set of technical and organizational challenges. Cindy Sridharan offers an honest overview of monitoring challenges and trade-offs.
      1:30pm-2:10pm (40m) Monitoring, Tracing and Metrics
      Customer-centric observability
      Mark McBride (Turbine Labs)
      With the recent flourishing of observability systems, there's no shortage of things to monitor. Sadly, humans have limited capacity to process them all. Mark McBride outlines three key metrics—request rate, success rate, and the latency histogram—that provide a high-level abstraction of the customer experience. If these three metrics are good, your system is healthy from a customer perspective.
      2:25pm-3:05pm (40m) Monitoring, Tracing and Metrics
      Instrumenting systems for arbitrary observability
      Baron Schwartz (VividCortex)
      Observability (or lack thereof), like testability and maintainability, is a fundamental property of systems. But what does observable code look like? What instrumentation creates systems that are observable later in arbitrary ways, in circumstances you can't foresee? Baron Schwartz outlines the most useful things to know about observability in systems in production.
      3:50pm-4:30pm (40m) Monitoring, Tracing and Metrics
      ETW: Monitor anything, anytime, anywhere
      Dina Goldshtein (Riverbed)
      Event Tracing for Windows (ETW) is the most important diagnostic tool Windows developers have at their disposal. Dina Goldshtein explores the rich and wonderful world of ETW events, which span numerous OS components. You’ll learn how to diagnose complex issues in production systems and discover ways to automate ETW collection and analysis to build self-diagnosing applications.
      4:45pm-5:25pm (40m) Monitoring, Tracing and Metrics
      Fast and safe production monitoring of JVM applications with BPF magic
      Sasha Goldshtein (Sela Group)
      Sasha Goldshtein explores a holistic set of BPF-based tools for monitoring JVM applications on Linux and outlines a systems performance checklist that includes classics like fileslower, opensnoop, and strace—all based on the noninvasive, fast, and safe BPF technology.
      11:35am-12:15pm (40m) Distributed Systems
      Lessons learned from load-testing distributed systems
      Jeffrey Valeo (Grubhub)
      Load testing is a complicated and time-consuming process in the world of monolithic applications. And with the move to distributed systems (microservices), it is even more complicated. Jeffrey Valeo draws on real-world examples to share tips on how to effectively load-test distributed systems.
      1:30pm-2:10pm (40m) Distributed Systems, Systems Engineering
      Debugging complex systems
      Terran Melconian (Air Network Simulation and Analysis)
      Terran Melconian explores an organized process for observing a misbehaving complex system, reasoning about possible causes, and isolating the fault. While it is not generally taught, all the successful senior engineers with operational experience Terran has talked to use a variant of this process.
      2:25pm-3:05pm (40m) Distributed Systems
      Building a skyscraper with Legos: The anatomy of a distributed system
      Tyler McMullen (Fastly)
      Many words have been spilled about distributed systems. Most of the time though, what we talk about are algorithms and techniques. But the practical realities of distributed systems are far from straightforward. Tyler McMullen outlines a new approach built to perform very high volumes of health checks across a cluster of machines for reliability and scalability.
      3:50pm-4:30pm (40m) Distributed Systems, Orchestration, Scheduling, and Containers
      Distributed systems patterns: From design to reality
      Brendan Burns (Microsoft)
      Formal patterns for distributed systems make it significantly easier to design and deploy reliable, scalable distributed systems. Brendan Burns explains how to transform these patterns into containers and a custom Kubernetes API, which you can use to simply instantiate a distributed system via declarative API.
      4:45pm-5:25pm (40m) Distributed Systems, Systems Engineering
      Systems management with a voice UI using Amazon Alexa
      Karthik Kirupanithi (Amazon Web Services)
      Voice UIs like Amazon's Alexa can make systems management simple, intuitive, and delightful. The virtual private assistant feel of a VUI, coupled with the abstraction that voice commands bring, break the tedium of management tasks. Karthik Kirupanithi demonstrates how to put together an Alexa skill that can perform tasks using the EC2 Systems Manager.
      11:35am-12:15pm (40m) Resilience Engineering
      The rise of polyglot programming at Netflix
      Mike McGarr (Netflix)
      Netflix has always been a Java shop, from its early DVD days to its migration to the cloud. This simplified the job for centralized teams, but as the popularity of non-JVM languages rose, these teams have begun to rethink their support strategy. Mike McGarr discusses the early days of Netflix's polyglot journey and where the company is going in the future.
      1:30pm-2:10pm (40m) Resilience Engineering, Systems Engineering
      Automation run rampant
      Kate Deutscher (GreenSync)
      Kate Deutscher explores common pitfalls to automating software delivery and explains how to find the processes in your delivery pipeline that can benefit the most from automation, focusing on three patterns commonly seen in automation tooling, backed by real-world case studies of when this pattern has worked well—and when it has ended in rampant failure.
      2:25pm-3:05pm (40m) DevOps & Tools, Systems Engineering
      Have you tried turning it off and turning it on again?
      Tanya Reilly (Squarespace)
      Tanya Reilly explores the parts of disaster recovery you might be less prepared for, covering why the best laid fallback plans tend to go wrong and why you should start deliberately managing your dependencies long before you think you need to.
      3:50pm-4:30pm (40m) Resilience Engineering
      The phone book is on fire: Lessons from the Dyn DNS DDoS attack
      Lex Neva (Fastly)
      When the DDoS attack crushed Dyn last October, did your DNS fail? Heroku's sure did. In response, Lex Neva deep dove into everything DNS to learn how to implement resilient DNS properly—reading RFCs, asking questions of pros, and performing real-world experiments when no one knew the answers. Join Lex to find out what does work and all the crazy details of DNS that he uncovered.
      4:45pm-5:25pm (40m) Resilience Engineering, Systems Engineering
      Scaling the machine learning systems powering Quora's home feed
      Nikhil Garg (Quora), Neeraj Agrawal (Quora)
      Millions of people visit Quora's home feed to find high-quality content personalized to their interests. It is powered by a highly performant distributed system running sophisticated ML algorithms. Nikhil Garg and Neeraj Agrawal describe the evolution of the home feed's architecture and share several lessons from building and scaling this system.
      11:35am-12:15pm (40m) Hardware, Storage, and Datacenters
      How Shutterfly migrated 10+ billion photos to the cloud
      Jack Chan (Shutterfly)
      Jack Chan describes how Shutterfly migrated metadata from over 10B photos from a private data center into AWS in 100 days and explores designs to absorb mountains of metadata, on-premises ecommerce integration, and parallel user experiences, all in a highly scalable fashion. Shutterfly Photos is now a hybrid cloud solution with images hosted on-premises and client-facing photos' metadata on AWS.
      1:30pm-2:10pm (40m) Hardware, Storage, and Datacenters
      Best practices in data migrations
      Andrew Fong (Dropbox)
      In 2016, Dropbox migrated 600 petabytes of data from managed cloud storage into its own data centers. Andrew Fong shares lessons and best practices for data migrations learned from this experience.
      2:25pm-3:05pm (40m) Hardware, Storage, and Datacenters
      FPGAs in the cloud?
      Julien Simon (AWS)
      FPGAs have become a hot topic in the IT industry, thanks to the unprecedented computing power that they bring to demanding HPC applications, and AWS recently introduced FPGA-powered instances (aka F1 instances) to make the process simpler and quicker. Julien Simon walks you through building an FPGA-enabled application, from design to simulation to synthesis to execution on an F1 instance.
      3:50pm-4:30pm (40m) Hardware, Storage, and Datacenters
      Managing server secrets at scale with a vaultless password manager
      Ignat Korchagin (Cloudflare)
      Ever wondered how to quickly and efficiently rollover all of your servers’ SSH keys or how to securely manage diskless systems? Ignat Korchagin outlines a simple approach that combines hardware support and a little cryptography to help operationalize the management of all the secrets in your cloud.
      4:45pm-5:25pm (40m) Distributed Data & Databases, Hardware, Storage, and Datacenters
      What we talk about when we talk about on-disk storage
      Oleksandr Petrov (Independent)
      In the world of big and fast data, it's important to be fluent in storage and know the right tools for each job. Alex Petrov shares techniques for picking the right database and indexes, understanding the trade-offs different types of storage bring, scaling out your data and planning its growth, and finding the best resources on the subject.
      9:00am-9:05am (5m)
      Wednesday opening welcome
      Mary Treseler (O'Reilly Media), James Turnbull (Glitch), Ines Sombra (Fastly)
      Mary Treseler, James Turnbull, and Ines Sombra welcome you to the second day of keynotes.
      9:05am-9:25am (20m)
      What if serverless was real?
      Nick Rockwell (The New York Times)
      For most of us, the best approach to scaling complex distributed systems is to not do it at all. Nick Rockwell asks, so why isn't serverless a bigger deal?
      9:25am-9:40am (15m)
      Above the line, below the line: A preview of the SNAFUcatchers Stella Report
      David Woods (Ohio State University SNAFUcatchers), Richard Cook (Ohio State University SNAFUcatchers)
      David Woods and Richard Cook offer a glimpse at the SNAFUcatchers Stella Report.
      9:40am-9:50am (10m) Sponsored
      Automating content delivery in a DevOps world (sponsored by Akamai)
      Craig Adams (Akamai Technologies)
      As the industry moves to distributed systems and a DevOps model, companies must adopt DevOps in order to automate CI/CD workflows and increase deployment velocity. Craig Adams explores the traditional DevOps pipeline, addresses how to think about CDN automation, and explains how Akamai is baking automation into its CDN.
      9:50am-10:10am (20m)
      The role of open source in a company
      Jessica Frazelle (Microsoft), Dino Dai Zovi (Capsule8)
      Jessica Frazelle and Dino Dai Zovi—technologists who have spent their careers managing the balancing act between community and commercial perspectives—discuss how to be effective at open source in your company.
      10:10am-10:15am (5m) Sponsored
      How real is real-user measurement? (sponsored by Catchpoint)
      Robert Castley (Catchpoint Systems)
      Some quarters of our market believe that real-user measurement (RUM) is the end-all and be-all of customer experience management. But with the advancement of ad blockers, RUM tags are often getting blocked. Robert Castley explores the relevancy of real-user data if real users are blocking RUM tags and shares some solutions.
      10:15am-10:35am (20m)
      Mentorship and sponsorship
      Lara Hogan (Wherewithall)
      To grow your technical leadership skills, it's critical to lean on your network of support. Mentors—people who can give you helpful advice—are usually easy enough to find. Lara Hogan explains that what can be even more valuable is finding sponsors—people who can help you find new opportunities and improve the visibility of your work.
      10:35am-10:45am (10m)
      Wednesday Closing
      Wednesday Keynotes
      11:35am-12:15pm (40m) Real time, events, streams & scale
      Drinking from the fire hose: Building a massive-scale monitoring stack
      Robert Claire (Pinterest)
      Rob Claire explores the the technical challenges and lessons learned in building a monitoring stack that can reliably process millions of events per second, covering specific technologies—including Spark Streaming, Kafka, and HBase—and best practices for managing and monitoring data.
      1:30pm-2:10pm (40m) Distributed Data & Databases, Real time, events, streams & scale
      Running a massively parallel stream processing system at Netflix
      Zhenzhong Xu (Netflix)
      Keystone, a critical piece of Netflix's backend data infrastructure, ensures massive data movements and real-time event processing. Zhenzhong Xu leads a deep dive into Keystone's architecture and underlying stream processing engines, sharing insights and proven paths on how the company achieves multitenancy, scalability, and resilience in a complex cloud-native distributed system environment.
      2:25pm-3:05pm (40m) Distributed Data & Databases, Real time, events, streams & scale
      Genji: A framework for building resilient near-real-time data pipelines
      Swaminathan Sundaramurthy (Salesforce Inc), Mark Cho (Pinterest)
      Pinterest has to support real-time decision making while operating on petabyte-scale data. Swaminathan Sundaramurthy and Mark Cho offer an overview of Pinterest's real-time data pipeline (modeled on quasi-Kappa architecture), its impact on the company's systems, and tools and processes used and demonstrate how Pinterest models real-time ads analytics on the platform.
      3:50pm-4:30pm (40m)
      Thriving under a continuous self-inflicted DDoS attack
      Kevin Beck (New Relic)
      New Relic customers send monitoring data to New Relic servers every minute—a continuous firehose of data. Drawing on his experience at New Relic, Kevin Beck shares best practices for building a streaming service based on Apache Kafka, self-monitoring for reliability and fault tolerance, and building a DevOps culture that anticipates and prevents outages.
      4:45pm-5:25pm (40m) Capacity Planning, Real time, events, streams & scale
      How Twitter built a framework to improve infrastructure utilization and efficiency at scale
      Vinu Charanya (Twitter)
      Twitter is powered by thousands of microservices running on an internal cloud platform, which offer compute, storage, messaging, monitoring, etc. as a service. Vinu Charanya explains how she and her team are building a system that captures, defines, provisions, meters, and charges infrastructure resources, redefining how systems are built atop Twitter infrastructure.
      11:35am-12:15pm (40m) Sponsored
      Accelerating cloud applications with Intel FPGAs (sponsored by Intel)
      Michal Skiba (Intel Corporation)
      Field-programmable gate arrays (FPGAs)—customizable digital circuits capable of processing large amounts of data incredibly quickly—have traditionally required deep expertise to program. Michal Skiba explains how Intel is helping developers accelerate their cloud applications through a software stack that greatly simplifies the use and management of FPGAs.
      1:30pm-2:10pm (40m) Sponsored
      What's new in Kubernetes 1.8 (sponsored by Google Cloud)
      Kelsey Hightower (Google)
      Kubernetes has become the go-to open source framework for managing containers and building application platforms that scale from 1 to 5,000 machines. Kelsey Hightower offers an overview of the Kubernetes 1.8 release and explains why this trend will continue.
      2:25pm-3:05pm (40m) Sponsored
      Fixing the internet everyday: Internet volatility, the vigilance that fixes it, and why it matters (sponsored by Oracle+Dyn)
      David Belson (Oracle+Dyn)
      Although we often think of “breaking the internet” in the context of a website that couldn't handle the traffic associated with a piece of viral media content, behind the scenes, critical pieces of internet infrastructure break on a regular basis. David Belson dives into some of these issues and explains how you can avoid being impacted by them.
      3:50pm-4:30pm (40m)
      Session
      To be confirmed
      4:45pm-5:25pm (40m)
      Session
      To be confirmed
      12:15pm-1:30pm (1h 15m)
      Wednesday lunch and Birds of a Feather sessions
      Birds of a Feather (BoF) sessions provide face-to-face exposure to those interested in the same projects and concepts. BoFs can be organized for individual projects or broader topics (best practices, open data, standards, etc.). BoFs are entirely up to you. We post your topic and provide the space and time. You provide the engaging topic.
      8:15am-8:45am (30m)
      Wednesday Speed Networking
      Meet us before the opening keynotes on Wednesday morning and get to know fellow attendees in quick, 60-second discussions.
      8:00am-9:00am (1h)
      Break: Coffee
      10:45am-11:35am (50m)
      Break: Break
      3:05pm-3:50pm (45m)
      Break: Break