Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Reliable data propagation between SQL and NoSQL databases using Aesop

Regunath Balasubramanian (Flipkart Internet)
1:30pm–2:10pm Wednesday, 12/02/2015
Hadoop & Beyond
Location: 328-329 Level: Intermediate
Tags: commerce
Average rating: ***..
(3.67, 3 ratings)
Slides:   1-PDF 

Prerequisite Knowledge

Experience building moderate to high performance online user-facing systems using RDBMS or other databases. Basic knowledge of different types of data stores, properties that describe data, and some prior reading on distributed systems.

Description

Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of online applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees – they will use memory buffer pools, write-ahead logs, optimize for flash storage etc. These data stores are not operated in isolation and must share data and updates on IT – for example, a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or columnar database changes.

This talk discusses general approaches to change data propagation and specific implementation details of the open-source project Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment, and also scales to multi-node partitioned clusters that process data concurrently at high throughput.

Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides at-least-once delivery guarantees and timeline-ordered data updates.

Aesop is used at scale in business critical systems – the multi-tiered payments data store, the user wishlist system, and streaming facts to data analysis platform at Flipkart. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka, and Elasticsearch clusters.

Aesop shares common design approach and technologies with the Facebook Wormhole system

Come attend this talk if you are evaluating data store(s) for your large scale service, or are grappling with more immediate problems like cache invalidation.

Photo of Regunath Balasubramanian

Regunath Balasubramanian

Flipkart Internet

Regunath Balasubramanian works at Flipkart as Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is a committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar – the world’s largest biometric identity platform. His primary interest is in large scale distributed systems. Learn more about him at https://github.com/regunathb/.