Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA
See Pricing & Packages

Performance beyond caching

Picture of Nena Caviness
Nena Caviness | @nenacaviness |

Last month, Jason Yee met with Fastly’s Senior Professional Services Engineer and Varnish Wizard Rogier “DocWilco” Mulhuijzen, and Lead Customer Engineer and Cat Herder, Austin Spires. They talked about everything from Varnish to CDNs and understanding performance. A lot of interesting things came up. Here are the highlights of that discussion:

How is Varnish different from other caching software?

Varnish implements a subset of the Edge Side Includes (ESI) language that allows you to have fragments in a page that are cached or not cached separately from other fragments in a page. Imagine, for instance, an ecommerce website, which is a classic example — there's always a little bit in the upper right corner with your shopping cart. It would be amazing if you could cache the product pages but not the shopping cart, so that you can serve the product page to everyone who visits it and then have that shopping cart updated separately. These days, a lot of people do that with AJAX calls back into the backend, but it's also possible to do that with ESI. Another feature that we really like is VCL, Varnish Configuration Language. Instead of being a config file in which you set some flags or some paths, it’s actually a programming language, albeit a very domain-specific programming language. When you're working with HTTP, VCL is really powerful. On top of that, the programming language is compiled to C and then compiled to a shared library, which is loaded into the Varnish process. Processing of each request goes through that little library and because it's all machine code it's very fast.

Is there a good way to think of CDNs beyond just “cache as a service”? What do CDNs offer that goes beyond a regular web developer running their own Varnish?

The main differentiator is globalization. Your website or mobile application will see improved performance if you can push out content closer to your end users. We like to say that when it comes to content delivery, CDNs beat the speed of light. This especially matters when you take encrypted connections, such as TLS, into consideration. If you don't have an existing connection to the server, a request is going to take four round trips. If you're coming from Europe and getting routed to a server on the West Coast of the US, that's going to end up being almost 500 milliseconds for a single request. If you put that through a CDN (and don't use any caching at this point), you're still going to get that down to about 200 milliseconds in the worst case scenario. That's a big, big upshot of using a CDN, even for your dynamic content.

If you start using a CDN with fast purging, you can cache static content as well as event-driven content. You can cache that type of content locally to all of your users, which will make the content perform really fast. When you update your content through a purge, it should immediately hit all the cache servers as well.

What is event-driven content?

The web is evolving. In the early days, most websites only needed to deliver static content — like CSS, images, and JavaScript files. As the web and online experiences have become more complex in recent times, the type of content that companies needed to deliver began to include APIs, dynamic HTML, and rapidly changing personalized data. Dynamic content like this previously had to go back to the server because it includes highly unpredictable and sensitive information that requires logic. This type of content has traditionally been uncacheable by CDNs. But a large portion of that unpredictable dynamic content, called event-driven content, is actually cacheable. This type of content includes things like news headlines or articles, sports scores, stock prices, or shop inventory for an ecommerce site — content that can remain static for an indefinite period if time, but changes suddenly. Modern CDNs are able to treat event-driven content in the same way you’d treat static content — purging and caching it at the edge.

For example, take a wiki for a popular TV show. Right around the time an episode airs, fans are going to update the wiki pages with new information regarding plot changes, characters, etc. Then, for about a week, there's not going to be a high rate of change. If you were to put that page on a CDN and purge immediately as changes occur, all of the edits users make would be immediately shown to readers in real-time. Then, when edits stop, the page is cached until the next edit.

Applications and websites are not static anymore. Applications involve user interaction, either with a mobile device or with a desktop site, and that interaction impacts other users. It’s what the internet has become. The changes that individuals make on a site or application impact the entire system. When you're caching applications like these, proper amounts of cache and validation are essential. There’s increasingly more event-driven content to factor in when architecting your application.

How has the micro-services movement changed caching?

Common event-driven content examples are user-based (editing a news article, for example) but machine or computational-based changes make up a significant portion of event-driven content. That's where the current micro-services movement fits in. An event-driven service could be rooted in human activity or it could be rooted in an external event. When architecting new services to leverage micro-services, validation and proper cache management matter even more.

As we mentioned, what CDNs are used for is changing — people are starting to think about using CDNs for whole pages. The next step is to realize that API calls are just HTTP requests, and their responses are just objects, so you can use a CDN for that as well. When you go down to the micro-service level, things are very manageable. Because of the smaller scale, it’s easier to keep track of relationships and caching rules. For example, if a change impacts a specific piece of data within a specialized micro-service, you know exactly what to purge from your CDN.

What are some important things to consider when selecting a CDN?

Configuration. You need to be able to influence your configuration in a very direct way. Also consider the ability to prepare your origin to better leverage a CDN’s abilities. Consider setting proper Cache-Control headers and combining them with headers that have different cache times for the CDN, such as Edge-Control or Surrogate-Control. These are settings that are CDN agnostic and shouldn’t impact how you select a CDN, but they’re definitely important and they're great web standards to explore.

What bit of advice would you give to people who want to explore HTTP performance?

Evaluate your favorite frameworks and tools. For example, Rails, WordPress, Django — they all have slightly similar but unique caching rules. Explore those; understand how that's going to affect your application's performance, and get a deeper understanding of your framework and HTTP in general. That's going to be the first step to really get your toe in the water to understand HTTP performance.

* * *

Doc & Austin are teaching an intensive, day-long training on CDNs at Velocity this May. Learn more »

Tags: performance, cdn, varnish

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group

2014 Videos

More Videos »

O’Reilly Media

Tech insight, analysis, and research