Build resilient systems at scale
May 27–29, 2015 • Santa Clara, CA

How LinkedIn used RUM to drive optimizations and make the site faster

Ritesh Maheshwari (LinkedIn)
1:45pm–2:25pm Thursday, 05/28/2015
Location: Ballroom AB
Average rating: ****.
(4.62, 8 ratings)
Slides:   external link

Prerequisite Knowledge

Real user monitoring, basic JavaScript

Description

LinkedIn, like many web companies, uses PoPs (point of presence) to terminate TCP connections closer to end-users. This improves the content download time. In this talk, we describe techniques used by LinkedIn to better utilize PoPs through new RUM measurements and in-depth data analysis.

We analyze the following questions, and show how JavaScript code tied with RUM data can be used to help answer these questions and improve performance:

  • Given a set of PoPs spread across the world, which set of end users should be routed to PoP A vs PoP B?
  • What is a better strategy for PoP selection: DNS GSLB or Anycast?
  • Where should we build the next set of PoPs?

This talk will cover the following topics:

  • How PoPs help in performance
  • Why accurate user-to-PoP assignment is important for performance
  • How to utilize your end-user browser sessions to emulate millions of “measurement agents” spread across the world
  • How to use the above technique, along with RUM data, to:
    • Measure which PoP is optimal for your end users
    • Analyze inefficiencies in DNS load-balancing techniques
    • Compare IP Anycast and DNS GSLB as PoP selection techniques
    • Decide where to build the next PoP from a candidate list of multiple geographic locations
    • Answer other relevant network performance questions
  • Why every web company should use RUM to drive performance optimizations

Attendees of this talk will learn how to utilize their (potentially millions of) end-user visits to do performance measurements, distributed across the globe, with results that are more accurate and realistic than synthetic measurements. Attendees will also be able to directly apply this knowledge to instrument RUM on their site, and use that data to make better optimization decisions.

In the end, we would like to encourage web companies to use RUM data from their users not only for performance monitoring but also to drive optimizations.

Photo of Ritesh Maheshwari

Ritesh Maheshwari

LinkedIn

Ritesh is currently a performance engineer at LinkedIn, working on making LinkedIn fast through network optimizations, RUM, and automation. Before LinkedIn, Ritesh was a performance engineer at Akamai, solving performance issues in Akamai’s global distributed network. Ritesh holds a Ph.D. in computer science from Stony Brook University, where he got passionate about performance while working on computer networks. Earlier, he finished his BTech in computer science and engineering from the Indian Institute of Technology, Kharagpur.

Comments on this page are now closed.

Comments

Randy Schnedler
05/28/2015 12:51pm PDT

Appreciate the quick response, Ritesh. I agree 90% is not wonderful, and the resolvers ignoring the TTLs can have more catastrophic results than just bad performance. Ultimately though, as you pointed out in the talk, the geolocation doesn’t matter and you just want to route each client to your front door via the fastest network path possible. There is clearly still room for improvement in various segments of the app design and delivery process.

Picture of Ritesh Maheshwari
Ritesh Maheshwari
05/28/2015 8:20am PDT

Hi Randy,

Thanks!

1. CDNs and many other major web companies usually run their own DNS. They then do some clever tricks to tie back each DNS query with a page view. These tricks can only be done if you have highly configurable DNS servers and access to a lot of data.

2. You are right that our location identification may be wrong as well. I do want to clarify that when I say Geo-IP mapping is poor, my standards are pretty high. They are actually close to 90% accurate most of the times. We can’t quantify it, but my guess it that most bad DNS-based PoP assignment happens due to bad resolver and (I forgot to mention this in the talk) resolvers not honoring TTLs.

Randy Schnedler
05/28/2015 7:37am PDT

Hi Ritesh. Nice presentation.

I am curious to know what the CDNs are doing to solve the problem of poor Geo-IP mappings.

Also, for your RUM data, how did you determine the country of origin for each client, given that the Geo-IP mapping databases are so poor?

Picture of Ritesh Maheshwari
Ritesh Maheshwari
05/27/2015 7:40am PDT

This talk will be a story of how LinkedIn utilized RUM and PoPs (Point of Presence) to improve network performance. Also, even though the description says “Javascript” and “RUM” knowledge is pre-requisite, I have updated the talk to be self-contained now and it does not need those pre-requisites anymore.

Picture of Ritesh Maheshwari
Ritesh Maheshwari
05/27/2015 7:32am PDT

I will be watching this space, so feel free to post any questions/comments!