Most people working with CDN caches know about the Vary header, but few properly understand what it really does. And with the advent of the Key header, new patterns for varying cache content will emerge. Andrew Betts shares common and advanced use cases for Vary, such as language, A/B testing, compression, and service worker support, and outlines potential changes to consider when Key arrives.
Sara-Jane Dunn discusses an entirely different paradigm of computing: the information-processing carried out by cells. Focusing on examples from cutting-edge stem cell research, Sara shares formal techniques from computer science that allow us to peer into the inner workings of biology, make sense of the earliest stages of development, and even program cells for use in therapy.
Although the blockchain is technically a distributed system, there has been a surprising lack interest from the distributed systems community. Catherine Mulligan explores the implications of the blockchain to distributed systems and explains what needs to be addressed in order to build and maintain them effectively.
As Expedia refactors its backend services into a finer-grained microservice architecture, frontend applications have begun to be split into smaller applications serving a small number of pages or content on the website. Rick Fast details how Expedia is creating an extremely configurable, self-service edge architecture for routing between frontend applications and managing bot traffic.
Can we make developers care about operations? Jürgen Cito shares real-world experience of developers struggling with operations and details a journey to incorporate runtime performance aspects into the developer's daily workflow and reduce performance problems reaching production.
Soo Choi (DevOps Research and Assessment)
Soo shares her experiences as a woman in tech. Even though she worked for NASA and co-founded her own successful company, rampant sexism in IT and bad experiences speaking in public nearly destroyed her career. She will examine common constructs about diversity and propose ideas to bring productive change to continue to build upon the solid foundation of inclusion we have created.
Your organization wants to go cloud native, but you don't want to hit the headlines as the victim of the latest hacking scandal. Liz Rice addresses the questions you need answers to: Will your deployments be less secure or more? How do DevOps processes like CI/CD and cluster orchestration affect your security profile? And what can we all do to minimize the risk of exploits?
There are two sides to monitoring: exposing problems and taking action to resolve them. Most monitoring systems handle the first, but Consul handles both. Seth Vargo explains how Consul enables self-healing infrastructure. By coupling service discovery with monitoring, Consul is able to intelligently route traffic away from unhealthy hosts or fail over to geographically different data centers.
Harry Winser explains how to leverage consumer-driven contracts to achieve fully independent releases of microservices across teams and how to handle a service rollback while still serving over 47 million requests a day. Harry also demonstrates how to use the Pact framework to continuously deliver services that depend on one another and Docker to make developer testing easier.
With ever-increasing demands for fast business change, how can we ensure our digital channels reflect the exacting standards of performance our customers (and business owners) expect? What does this look like in an age of DevOps and continuous delivery? Thomas Barns and John Pillar share a strategy for shifting left and automating performance analysis.
DDoS mitigation is an ever-evolving art. Architectures change, attackers get more creative, and keeping your team and tools ahead of the curve is a constant battle. So why not make DDoS preparedness fun as well as practical? Shannon Weyrick explains why you should use DDoS war games to keep your team’s skillset polished, their tools in top shape, and their spirits and confidence high.
Forget Conway's law. In distributed systems, Murphy’s law rules: Everything that can go wrong will go wrong. Anne Currie discusses common failure modes, how to approach diagnosing highly complex issues, and what we can learn from detectives like Sherlock Holmes, Hercule Poirot, and Miss Marple.
Edge computing is a hot topic, but despite all the hype, there are still some major hurdles to overcome before it reaches its full potential. Tyler McMullen outlines the technical and economic challenges and explains how we can get past them.
Peter Bourgon and Sean Braithwaite offer an overview of microservices and data pipelines, explaining how both systems reflect the organizations and people that build them (in adherence to Conway’s law) and can be well understood in terms of their relationship to change and time. You'll learn the virtues and vices of each architecture and get enough context to apply them coherently.
By failing to prepare, you are preparing to fail. Your risk mitigation strategy must layer the most cost-efficient strategies to effectively mitigate or reduce the adverse effects of failure. Manuel Alvarez explores using the CDN as a failover tool, reviewing use cases and demonstrating how to decide whether to use a CDN by evaluating costs, benefits, operations, and time to mitigate.
Microsoft has widely deployed field-programmable gate arrays (FPGAs) for accelerating search, networking, and machine learning—with a little help from Intel's software expertise and its FPGA programmers. Mike Strickland explains how a single FPGA can deliver significant acceleration for multiple workloads.
Docker offers many advantages, simplifying both development and production environments. But there is still uncertainty around the security of containers. Ben Hall answers the question, How secure are Docker containers?, exploring Docker's security model, its limitations, and how to handle them.
Did you read the O’Reilly book about Google SREs but doubt that SRE will work for your more traditional or more regulated company? Janna Brummel and Robin van Zijll explain how they implemented SRE in a global financial organization, providing an overview of methods and technologies and sharing lessons learned from a year of doing SRE.
HTTP/2 (or H2, as the cool kids call it) has been ratified for months, and browsers already support it. But do the exciting features that HTTP/2 offers meet expectations? Frederik Deweerdt explores how HTTP/2 fares in the real world, how browser behavior is changing to accommodate new server-side functionality, and how you can get the most of the new protocol everybody’s talking about.
Software development is a social activity that favors direct human contact, yet 21st century life can often get in the way, forcing us to reconsider our communication patterns. Daniel Young and Emma Jane Hogbin Westby explore how to build and maintain happy productive teams, regardless of geography.
Understanding the state of a running application is the key to efficiently troubleshooting production issues and ultimately anticipating outages. Pierre Vincent demonstrates how to make monitoring an integral part of development, using health checks, metrics, tracing, and other patterns to get a clearer picture of applications in production.
You rely on Jenkins to manage the full stack of your continuous delivery pipeline, but why shouldn’t Jenkins itself be software defined, ephemeral, and available at the push of a button? Mandy Hubbard explains how Care.com uses a customized, script-based startup process and Joyent’s ContainerPilot with a just few edits to a Docker Compose _env file to launch Jenkins in a Docker container.
What happens when Tech for Good and human-centered design actually support the needs of their end users? Laura Hackney explores the pitfalls and successes of the movement to bring social justice work into the technology landscape. Laura also shares insights from AnnieCannons, her nonprofit dedicated to transforming survivors of human trafficking into software professionals.
Episource just finished building a scalable, resilient serverless distributed data pipeline for coding medical charts using NLP, which scales seamlessly with the amount of data it takes in as input. Raj Rohit explores the system and the tools used to build it, such as Ansible, Lambda, and Terraform, and shares the pitfalls, failures, successes, and lessons learned along the way.
Emile Vauge explains how to effectively manage inbound network traffic in your container-based infrastructure with Traefik, a modern reverse proxy and load balancer made to deploy microservices with ease.
Using real-world metrics data from thousands of organizations, Jason Yee explores the latest trends in container adoption and use, shares data on what types of applications organizations are running in containers, and explains how to best monitor these containerized applications.
Vasia Kalavri offers an overview of Strymon, a system for predictive data center analytics, and its online critical path analysis module. Strymon analyzes live traces from distributed dataflow systems like Apache Spark, Apache Flink, and TensorFlow to predict bottlenecks and provide insights on streaming application performance.
A new approach to data analytics acceleration is delivering benchmarked performance increases of 3X to 10X+ at the system level for traditional relational and NoSQL databases.
Matthew Skelton shares five practical, tried-and-tested techniques for improving operability with many kinds of software systems, including the cloud, serverless, on-premises, and the IoT.
Distributed systems used to be the exception, but today they're the norm, so it's more useful than ever to be able to quantify scalability. Baron Schwartz explains how to use the Universal Scalability Law to characterize how your systems truly behave, why they don't scale like they should, and how to improve them. It's a simple, elegant solution, and, although formal, it requires no math.
Uwe Friedrichsen explores the challenges, options, and trade-offs of different consistency models in distributed system landscapes, covering the limitations of ACID transactions, eventual consistency, and current research that tries to fill the gaps between ACID and BASE transactions.
Alexander Akbashev explains how his company scaled a single-instance Jenkins master from 20K builds per day to 140K using Amazon AWS services (EC2, S3, Memcache, etc.). Everything done to achieve this result was open sourced and upstreamed.
Christopher Meiklejohn is building an application that helps users select a bottle of wine based on the wines that they enjoy, using a new programming language called Martinelli. Christopher offers an overview of Martinelli, highlighting the key features of this new language that allow the fault-tolerant, highly scalable operation of his application.
Have you ever had to monitor the health of your service (server stats, application errors, etc.)? What if you had to monitor the cloud, with its hundreds of thousands of servers? Alerts can create noise and spam your team. Mihai Bojin and Kamil Smuga explain how Salesforce approaches monitoring at scale by putting customers first.
A developer hunting for a bug is like a doctor hunting for an illness. She does not need complete understanding of the body for the hunt to be successful. Jasvir Nagra and Marianna Bezler share a few painful distributed web app debugging anecdotes and an alternate approach using virtualization and visualization to get a holistic view of a program to track down elusive bugs.
Do you have an old monolith you really want to rewrite, but don’t know where to start? Dalia Simons shares ideas, tips, and strategies for rewriting an important monolith service into microservices while maintaining full availability.
Kavya Joshi shares strategies to prepare systems for flux and scale. Drawing from a range of use cases, including Facebook’s Kraken, which provides shadow traffic, and Samsara's custom load simulator, Kavya demonstrates how to improve your understanding of your systems as they run today and plan for how they'll run tomorrow.
Angie Jones explains how to build stability and credibility into your continuous integration tests so that your team is able to receive the fast feedback it needs for Agile development.
Chaos engineering is intentionally injecting failure into a system to proactively identify and fix problems before they cause outages. It’s an emerging discipline, but its roots are decades old. Kolton Andrus explores the evolution of chaos engineering, how to begin your journey toward resilient systems, and how to make those pagers quit buzzing at 3:00am.
The State of DevOps Report has shown that high-performing IT teams decisively outperform low-performing peers (with greater throughput and stability), creating value that shows up on the bottom line. Nicole Forsgren and Nigel Kersten share insights into the key leadership, technical, architectural, and product capabilities that drive these outcomes.
The popularity of Git and GitHub has led to an explosion in the number of software repositories. But is creating a new repository always the right approach? Gareth Rushgrove offers an overview of the monorepo—putting all your product's or organization's code in a single repository—covering the advantages of monorepos and the tools to help maintain them.
Chris Jackson explains how 175-year-old company Pearson built a tech startup within the enterprise with the aim of innovating the developer experience. Chris shares the journey from inception to B-round funding and explains how this startup is establishing the foundation of the company's future.
Last year, Mindaugas Mozūras's company was in dire straits. Its strategy was not working. All the key metrics were drifting downward. People left. The company even did a reorg. During this time, he had many last conversations—sometimes trying to stop people from leaving, other times to let them go. Mindaugas relates three such conversations, sharing lessons on honesty and delivering bad news.
Keeping your signal-to-noise ratio high is a nontrivial problem. Modern tools make it easy to overmonitor (which leads to noise). The result? Missed alarms and unhappy customers. Filtering the noise is not the answer. Kishore Jalleda explains how Yahoo reduced the alert volume from ~200K a month to a few hundred by creating the right incentives and culture.
Welcome to the world of nanoservices: smaller than a microservice, bigger than a function, they are the perfect unit of software. Nanoservices are flexible, manageable, and scalable and a great way to do serverless computing. Matthew Clark explains how to get nanoservices right, drawing on his experience at the BBC, which now has over a thousand in production.
Feeling overwhelmed by huge amounts of data has become the norm. Creating effective visual representations of data offloads some of the work of quickly finding interesting patterns to our powerful perceptual system. Miriah Meyer explores the role that interactive visualizations can play in helping us find meaning in mounds of data and discusses the limitations of this approach.
What are your perceptions of NHS IT? Not great? Well the truth is very different from what you might expect. Ed Hiley and Dan Rathbone offer an overview of the technical renaissance going on in parts of the NHS, where things are being done in a modern way.
In a containerized deployment, how do you safely pass secrets like passwords and certificates between containers without compromising their safety? If orchestration means a container can run on any machine in the cluster, how do you minimize who knows your secrets? Liz Rice explores the risks and shares best practices for keeping your secrets safe.