Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA

IPSec mesh network: Perfect for the cloud?

Douglas Barth (Stripe)
4:10pm–4:50pm Thursday, 05/28/2015
Location: Ballroom GH
Average rating: ****.
(4.50, 6 ratings)

Prerequisite Knowledge

This topic is going to go deep on IPSec and how we implemented a mesh network using standard Linux tools. Attendees should be comfortable talking about cipher suites, network addresses, and kernel bugs.


Private networks are traditionally assumed secure, and traffic crossing the public internet is secured via VPN. For data center-to-data center traffic, that involves a site-to-site tunnel with a VPN concentrator on either side of the encrypted tunnel.

This hub-and-spoke architecture presents a simple solution for protecting network traffic over the internet, but making that solution highly available involves added complexity. Multiple tunnels need to exist, with traffic load balanced across them. Automation must exist to detect down (and half down) links and redirect traffic. Also, these tunnels should scale up as the amount of traffic passing between data centers increases. And this solution does nothing to protect network traffic on the private network—a private network that is increasingly managed by cloud providers and shared with other companies.

When traffic is flowing over networks that we don’t manage (both over the WAN and the LAN), it is time to rethink our network security practices. By using DevOps practices in our network systems, PagerDuty was able to get rid of the hub-and-spoke model and instead use an IPSec mesh architecture. Each server in our system establishes a secure association with its peer and transmits all traffic using IPSec transport. Each host manages encryption and decryption of its own traffic, so our ability to protect that traffic naturally scales up as we add new infrastructure.

This talk will focus on how we implemented that model on our Linux fleet. We will dig into the details of our configuration including the policies we use, and the encryption and authentication mechanisms in place. We will talk about how this model performs on our systems and the impact it has on the production workload. Finally, we will discuss how it handles failure, bugs we’ve found along the way, and how we see this model changing as our infrastructure continues to grow.

In the end, I hope to have given everyone a better understanding of how VPNs work, and how through combining the development and operations disciplines we can produce a solution that was previously considered impractical.

Photo of Douglas Barth

Douglas Barth


Doug Barth is a software generalist who has currently found himself doing operations work at PagerDuty. Prior to joining PagerDuty, he worked for Signal in Chicago and Orbitz, an online travel company. He loves beer, foosball, and Tool. You can follow him on Twitter @dougbarth