Engineer for the future of Cloud
June 10-13, 2019
San Jose, CA

Improving reliability of your distributed data store

Mehant Baid (Dropbox)
11:35am12:15pm Wednesday, June 12, 2019
Distributed Data
Location: 230 A
Average rating: ****.
(4.00, 9 ratings)

Level

Beginner

Prerequisite knowledge

  • Rudimentary knowledge of databases and distributed systems

What you'll learn

  • Learn classes of problems to expect as you scale your service
  • Get insights gained from operating a distributed data store at scale
  • Learn design principles to architect a highly reliable and operationally lightweight service

Description

Edgestore is a low-latency, distributed data store that is one of the largest services developed at Dropbox. It serves 10 million requests per second and stores over 10 trillion objects at rest. In the last few years, Edgestore has grown from being used by a handful of services to being the primary data store for all of Dropbox’s metadata needs.

Mehant Baid discusses the challenges Dropbox faced as the company scaled Edgestore and its journey from being an operationally burdensome service that’s plagued with incidents and postmortems to a service that’s highly reliable and operationally lightweight.

Mehant begins with a brief introduction of Edgestore, then discusses the various ways Dropbox’s systems and processes failed as it scaled its service. He also shares the various technical solutions adopted at Dropbox, along with the trade-offs, to improve service reliability. Some of the problems he dives deep into include load management, client isolation, linearized reads from replicas, schema evolution, and data validation.

Photo of Mehant Baid

Mehant Baid

Dropbox

Mehant Baid is a software engineer at Dropbox. For the past few years, he’s been working on Edgestore, the distributed data store that handles all of Dropbox’s metadata needs. Previously, he worked on the database kernel at Oracle, where he focused on scaling inserts into the database. He’s a committer and project management committee member with the Apache Software Foundation and worked with the open source community to develop Apache Drill—an SQL engine for Hadoop, NoSQL, and cloud storage. His primary interests are the fields of distributed systems and databases.