Build Systems that Drive Business
June 11–12, 2018: Training
June 12–14, 2018: Tutorials & Conference
San Jose, CA

From dandelion to tree: Scaling Slack

Bing Wei (Slack)
3:40pm–4:20pm Wednesday, June 13, 2018
Systems Engineering & Architecture
Location: LL21 C/D Level: Intermediate
Secondary topics: Systems Architecture & Infrastructure
Average rating: ***..
(3.00, 4 ratings)

Prerequisite knowledge

  • A basic understanding of distributed systems

What you'll learn

  • Learn how Slack scaled by rearchitecting its system with lazy loading, a publish/subscribe model, and an edge cache service

Description

Communication and collaboration platform Slack has been fortunate to experience exponential user growth since its launch in 2014. Slack was originally designed for small teams, and as the user base grew, the original design decisions didn’t scale with the rapid growth. Some of those powerful initial design decisions later became liabilities as the company had to support hundreds of thousands of users communicating at once.

By 2016, Slack faced a problem: the load on its backend servers had increased by 1,000×. Once, a whole team was knocked offline and couldn’t reconnect because they uploaded thousands of emojis, a use case that wasn’t expected. The spike of events caused a wave of client reconnections that cascaded into database failures.

Bing Wei explains how rearchitecting the system with lazy loading, a publish/subscribe model, and an edge cache service overcame the problem with zero downtime, improved latency, and led to gains in reliability and availability. Bing also discusses Slack’s ongoing effort to build a generalized publish/subscribe framework and how the company handles data synchronization between clients and backend servers, a solution that should further improve latency and reduce backend cost. She also compares her time at Slack with her experience on the Twitter infrastructure team, detailing how the companies’ approaches differ and what Slack could learn from other web-scale companies.

Photo of Bing Wei

Bing Wei

Slack

Bing Wei is a software engineer on the infrastructure team at Slack, working on its edge cache service. Previously, she was at Twitter, where she contributed to the open source RPC library Finagle, worked on core services for tweets and timelines, and led the migration of tweet writes from a monolithic Rails application to JVM-based microservices.