Presented By O'Reilly and Cloudera
December 5-6, 2016: Training
December 6–8, 2016: Tutorials & Conference
Singapore

From telco data to spatial-temporal intelligence APIs: Architecting through microservices

5:05pm–5:45pm Wednesday, December 7, 2016
Spark & beyond
Location: 321/322 Level: Advanced
Average rating: *....
(1.00, 1 rating)

What you'll learn

  • Understand the production architecture at DataSpark that works through terabytes of spatial-temporal telco data each day in PaaS mode and how DataSpark operates in SaaS mode

Description

Creating big data solutions that can process data at terabyte scale and produce spatial-temporal real-time insights at speed demands a well-thought-through system architecture. The architecture should support the creation of a data pipeline that involves ingestion, processing, indexing, caching and retrieval of insights, and finally a propagate collection of data metrics across all the system layers. Chandras Sekhar Saripaka details the production architecture at DataSpark that works through terabytes of spatial-temporal telco data each day in PaaS mode and showcases how DataSpark operates in SaaS mode.

Topics include:

  • How DataSpark internalizes DevOps infrastructure into the architectures—A Docker-based platform on Mesos with microservices helps to produce a service resilient ecosystem for managing modular APIs and shows packaging infrastructures to support for cloud and on-premises data centers.
  • How to do faster and better ETL—Ease of ETL processes from any source to sink using distributed computing of Spark is achieved by componentization of Spark APIs with other ecosystem tools, which helps the ingestion and processing layer and also provides faster access to operate in both streaming and batch modes.
  • How to translate the features from the data into APIs that can be used in dashboards and also exposed as APIs without sacrificing the data artifacts—This is achieved by streamlining the indexing and caching process to translate the semiprocessed data to temporal and spatial collections and caches.
  • How to generate flexible insights through APIs across different dates, hours, and minutes across different locations—Flexibility of APIs through Docker-based microservices is attained by querying on temporal and location indices and caches.
Photo of Chandras Sekhar Saripaka

Chandras Sekhar Saripaka

DATASPARK

Chandra Sekhar Saripaka is a product developer, big data professional, and data scientist at DataSpark in Singtel. He has a deep experience in financial products, CMS, and identity management and is an expert in data crunching at terabyte scale on graphs and Hadoop. Previously, Chandra carried out research on image search indexing and retrieval and has built many architectures on enterprise integration and portals, a cloud search engine for ecommerce, and a framework for real-time news recommendation systems.