Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

Design patterns for real-time data analytics

Sheetal Dolas (Hortonworks)
2:55pm–3:35pm Wednesday, 09/30/2015
Location: 3D 06/07

As businesses are realizing the power of Hadoop and large data analytics, many are demanding large-scale, real-time streaming data analytics. Apache Storm and Apache Spark are platforms that can process large amounts of data in real time. However, building applications on these platforms that can scale, reliably process data without any loss, satisfy functional needs, and at the same time meet the strict latency requirements, takes a lot of work to get it right.

After implementing multiple large real-time data processing applications using these technologies in various business domains, we distilled commonly required solutions into generalized design patterns. These patterns are proven in the very large production deployments where they process millions of events per second, tens of billions of events per day, and tens of terabytes of data per day.

This talk covers these proven design patterns and every design pattern it covers – problem statement, applicability of design pattern, the pattern design, and sample code demonstrating the implementation. Attendees can take advantages of these patterns in building their applications and improve their productivity, quality of solution, as well as the success factor of their applications.

Photo of Sheetal Dolas

Sheetal Dolas


Sheetal Dolas is a principal architect working with Hortonworks with strong expertise in the Hadoop ecosystem and rich field experience. He helps small to large enterprises solve their business problems strategically and functionally as well as at scale by using big data technologies. Sheetal has over 14 years of strong IT experience and has served in key positions as lead big data architect, SOA architect, and technology architect in multiple large and complex enterprise programs. He has extensive knowledge of big data/NoSql technologies including Hadoop, Hive, Pig, HBase, Storm, Kafka etc., and has been working in this space for the last four+ years. He has defined and established EDW architectures for multi petabyte data warehouses on Hadoop platforms, dealt with tens of petabytes of data and thousands of node clusters.

Comments on this page are now closed.


Elizabeth Barayuga
12/16/2015 5:31am EST

Is there a link to the slide?