Build resilient systems at scale
October 12–14, 2015 • New York, NY

Writing a polyglot datastore story

Joseph Lynch (Yelp), Josh Snyder (Yelp)
1:15pm–1:55pm Wednesday, 10/14/2015
Location: Regent Parlor
Average rating: ***..
(3.67, 12 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

Participants should understand the fundamentals of service oriented architecture, and its value proposition in scaling websites and the engineering organizations responsible for them. A basic knowledge of operational tools like Puppet, Chef, Nagios, or Sensu is helpful but not required. Expertise in datastores will enrich the talk, but is by no means necessary.

Description

Within a service oriented architecture, different services will expect varied guarantees from their datastores. Some services need proven technology that offers reliable transactions. Others need performant real time search, and yet others need extremely high-write throughput with tunable consistency. Which datastore a service chooses is one of the most important factors determining application flexibility, iteration speed, performance, and reliability in production.

Our session will chronicle a technical journey as Yelp experienced it, describing the problems we solved as we iterated on our SOA architecture, from zero services to hundreds and from primarily MySQL to polyglot, including Cassandra, Elasticsearch, and Zookeeper.

We will consider three major technical topics:

  • Deploying datastores in a manageable and isolated fashion, taking into account the dependencies required by a diverse service stack
  • Scaling a dynamic discovery and query routing layer to handle ever-changing clusters
  • Building a monitoring system capable of informing developers of problems before they impact their service

Complementing the technical aspects, we will discuss the organizational and cultural challenges of putting control over datastores in the hands of developers. We will first discuss how we structure our configuration management to enable developers to provision datastores without ever writing a line of Puppet, and we will cover how we isolate datastores and ensure that one service’s datastore cannot impact another.

We will then show how we do discovery and request routing with a novel combination of discovery coprocesses, service registries, and query proxies. Our infrastructure is designed so that datastore backends can appear and disappear, all without impacting user traffic or requiring application developers to change code.

Finally we will show how we customize off-the-shelf monitoring tools like Graphite and Sensu to provide easy-to-use monitoring systems for developers. These systems strive to not only notify developers when their datastore is completely unavailable, but also to provide early warning so that self-healing systems or operators can remedy issues before they become sitewide problems.

Photo of Joseph Lynch

Joseph Lynch

Yelp

Joseph Lynch is a software engineer for Yelp who focuses on building data store and service infrastructure. Joey is a core contributor to Yelp’s data store platform, which has allowed Yelp to go from a primarily MySQL data tier to a polyglot data tier including Elasticsearch, Cassandra, and Zookeeper. He loves pushing the edge of how Yelp uses DevOps tools to automate infrastructure and never met a problem he didn’t want to automate away. When not wrangling clusters of data stores, Joey enjoys building service discovery, reliable communication, fast deployment, and monitoring into Yelp’s SOA.

Photo of Josh Snyder

Josh Snyder

Yelp

Joshua Snyder is an SRE with a penchant for dealing with data. He likes designing web infrastructure that serves its functions reliably and silently. To that end, Josh is always rounding off sharp corners of infrastructure, and crafting new functionality to be as maintainable as possible. He works mostly in relational stores like MySQL and Postgres, but occasionally finds himself fiddling with Cassandra. Josh has previously spoken at MySQL Connect and Percona Live.

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group

Videos

More Videos »

O’Reilly Media

Tech insight, analysis, and research