Build & maintain complex distributed systems
October 1–2, 2017: Training
October 2–4, 2017: Tutorials & Conference
New York, NY

How LinkedIn determines the capacity limits of its services using live traffic

Susie Xia (LinkedIn), anant Rao (LinkedIn)
4:45pm5:25pm Tuesday, October 3, 2017
Average rating: ****.
(4.67, 6 ratings)

Who is this presentation for?

  • Performance and capacity engineers

Prerequisite knowledge

  • A basic understanding of service-oriented architecture, performance engineering methodologies and metrics, and capacity planning

What you'll learn

  • Understand how LinkedIn leverages current live production traffic to scientifically stress test services and determine inefficiency root causes without impacting site and user experiences
  • Learn how to architect a system to automatically identify and anticipate resource utilization bottlenecks using live traffic
  • Explore tips and techniques for leveraging live traffic tests for capacity planning and identifying how much your data center is over- or underprovisioned

Description

Modern web services like LinkedIn are made up of hundreds of microservices running in geographically distributed data centers. Each microservice needs to be wisely allocated capacity to use data center resources efficiently. However, it’s challenging to accurately determine the service capacity limits and provide resource allocation guidance for rapidly growing web services like LinkedIn due to the constantly changing traffic shape, the heterogeneous infrastructure characteristics, and the evolving bottlenecks.

Susie Xia and Anant Rao explain how LinkedIn achieves automated capacity measurement and headroom analysis at scale via a system called Redliner, which runs load tests by shifting live user traffic to target service instances in real production environments, helping reduce data center costs, execute proactive capacity planning, and detect performance regressions in development cycles. Susie and Anant also share lessons learned in building and maintaining Redliner and tips on how you can use your current service-oriented architecture to do the same.

Topics include:

  • Challenges in large-scale web service capacity planning
  • Debunking the myth of testing in production
  • Overview of Redliner: How LinkedIn determines the capacity limits of its services
  • Use cases: How Redliner helps improve service resource utilization efficiency
  • Promoting and building performance-driven architecture
Photo of Susie Xia

Susie Xia

LinkedIn

Susie Xia is a senior software engineer at LinkedIn, where she focuses on scalability and capacity analysis. Previously, she worked on mobile applications and automation.

Photo of anant Rao

anant Rao

LinkedIn

Anant Rao is an engineering lead at LinkedIn, where he works on performance optimization and capacity planning, focusing on making LinkedIn’s apps go fast and working on infrastructure to prevent performance issues before they make it to production.