Presented By O'Reilly and Cloudera
Make Data Work
December 1–3, 2015 • Singapore

Patterns and paradigms: Managing semi-structured data with high velocity change for large scale e-commerce

Utkarsh B (Flipkart Internet Private Limited), Vinod Venkatraman (Flipkart Internet Private Limited)
11:50am–12:30pm Thursday, 12/03/2015
Hadoop & Beyond
Location: 328-329 Level: Intermediate
Tags: commerce
Average rating: ****.
(4.00, 2 ratings)
Slides:   1-FILE 

Prerequisite Knowledge

Brief understanding of HBase and Elasticsearch.

Description

In this talk, we unravel the experience of developing “Hoodoo” — an in-house solution at Flipkart.com to manage the enormous catalog of the marketplace. We’ll share the paradigms and patterns that evolved through the lifecycle of the solution.

Hoodoo is a generic, distributed, and elastic data store abstraction that helps to manage semi-structured data that has a high velocity of change in semantics and structural definition. Using primitive concepts of entities and relationships (E-R Modelling), it helps model and manage functional data with such traits. Hoodoo unifies data access patterns in its APIs (id based access, parametrized queries, search, et al) and provides tuneable consistency levels for stored data.

Functional data can often be non-trivial to manage and serve, especially when it is constantly evolving. As an example, consider catalog data for a retail marketplace like Flipkart.

  • The metadata for a catalog entry is dynamic in nature (elasticity)
  • Catalog entries share meaningful associations that could be transient or static, with time (flexibility)
  • Multiple looking glasses to the same data (semantic relevance)
  • Additionally, the flux of change is large (variability)
  • Now all this is to be managed when the catalog data size is 3 billion and growing

Hoodoo uses the following patterns, techniques, and technologies:

  • HBase to store entities and Elasticsearch to index entity properties, enabling search as well as optimized id-based look-ups
  • Provide for eventual consistency between the data stores using techniques like write ahead logs that are then applied reliably
  • Support multi-tenancy and tuneable consistency schemes while serving data with low latencies at scale
  • Timestamp consistent data views to entities and their associations
Photo of Utkarsh B

Utkarsh B

Flipkart Internet Private Limited

Utkarsh B is principal architect at Flipkart responsible for building the marketplace technology platform with specific focus on building Cataloging as a Service. Utkarsh has extensive experience (12+ years) in large scale product/systems development, including seven years in the product development space. His specialties include service-oriented architecture, distributed computing, data storage techniques, map reduce, large scale product architecture and design, and platform components/stacks.

Vinod Venkatraman

Flipkart Internet Private Limited

Vinod Venkatraman is currently helping build the marketplace technology platform at Flipkart. Vinod’s specialties are Core Java, Java concurrency, SOA, JMS, web services, JTA, web development, Spring, Struts, JDBC, SQL, JavaScript, Ajax, Ext-JS, DB schema design, and performance optimization.