Get the free Ebook:
Private and Open Data in Asia: A Regional Guide.
Large scale internet systems often use a combination of relational (SQL) and non-relational (NoSQL) data stores. Contrary to product claims, it is hard to find a single data store that meets common read-write patterns of online applications. Different databases try to optimize for specific workload patterns and data durability, consistency guarantees – they will use memory buffer pools, write-ahead logs, optimize for flash storage etc. These data stores are not operated in isolation and must share data and updates on IT – for example, a high performance memory based KV data cache might need to be updated when data in the source-of-truth RDBMS or columnar database changes.
This talk discusses general approaches to change data propagation and specific implementation details of the open-source project Aesop, including some of its live deployments. It covers capabilities suitable for single node deployment, and also scales to multi-node partitioned clusters that process data concurrently at high throughput.
Aesop scales by partitioning the data stream and coordinates across subscription nodes using Zookeeper. It provides at-least-once delivery guarantees and timeline-ordered data updates.
Aesop is used at scale in business critical systems – the multi-tiered payments data store, the user wishlist system, and streaming facts to data analysis platform at Flipkart. Aesop has been used successfully to move millions of data records between MySQL, HBase, Redis, Kafka, and Elasticsearch clusters.
Aesop shares common design approach and technologies with the Facebook Wormhole system
Come attend this talk if you are evaluating data store(s) for your large scale service, or are grappling with more immediate problems like cache invalidation.
Regunath Balasubramanian works at Flipkart as Principal Architect for Commerce and Supply Chain platforms. He also leads Flipkart’s open source initiatives and is a committer on a number of projects. Prior to Flipkart, he architected and built Aadhaar – the world’s largest biometric identity platform. His primary interest is in large scale distributed systems. Learn more about him at https://github.com/regunathb/.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.