Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Design patterns for real time streaming data analytics

Sheetal Dolas (Hortonworks)
1:30pm–2:10pm Thursday, 02/19/2015
Hadoop in Action
Location: 210 A/E
Average rating: ***..
(3.36, 11 ratings)
Slides:   1-PPTX 

As businesses are realizing the power Hadoop and large data analytics, many businesses are demanding large scale real time streaming data analytics. Apache Storm and Apache Spark are platforms that can process large amount of data in real time. However building applications on these platforms that can scale, reliably process data without any loss, satisfy functional needs and at the same time meet the strict latency requirements, takes lot of work to get it right.
After implementing multiple large real time data processing applications using these technologies in various business domains, we distilled commonly required solutions into generalized design patterns. These patterns are proven in the very large production deployments where they process millions of events per second, tens of billions of events per day and tens of terabytes of data per day.

Latency sensitive lossless micro batching, high scale data enrichment through external systems lookup, dynamic rules and alerts, adaptive self tuning, real time stream joins are some of patterns to name.

This talk covers these proven design patterns and for every design pattern it covers – problem statement, applicability of design pattern, the pattern design and sample code demonstrating the implementation.

Attendees can take advantages of these patterns in building their applications and improve their productivity, quality of solution as well as success factor of their applications.

Photo of Sheetal Dolas

Sheetal Dolas


Sheetal is a Principal Architect working with Hortonworks with strong expertise in Hadoop ecosystem and rich field experience. He helps small to large enterprises solve their business problems strategically, functionally as well as at scale by using BigData technologies.
He has over 14 years of strong IT experience and has served in key positions as Lead Big Data Architect, SOA Architect, Technology Architect in multiple large and complex enterprise programs. He has extensive knowledge of BigData/NoSql technologies including Hadoop, Hive, Pig, HBase, Storm, Kafka etc and has been working in this space for last 4+ years. He has defined and established EDW architectures for multi petabyte data warehouses on Hadoop platforms, dealt with 10s of PB of data and 1000s of nodes clusters.

Comments on this page are now closed.


Christopher Ried
02/19/2015 5:55am PST

Could you post the slides of your talk?