As businesses are realizing the power of Hadoop and large data analytics, many are demanding large-scale, real-time streaming data analytics. Apache Storm and Apache Spark are platforms that can process large amounts of data in real time. However, building applications on these platforms that can scale, reliably process data without any loss, satisfy functional needs, and at the same time meet the strict latency requirements, takes a lot of work to get it right.
After implementing multiple large real-time data processing applications using these technologies in various business domains, we distilled commonly required solutions into generalized design patterns. These patterns are proven in the very large production deployments where they process millions of events per second, tens of billions of events per day, and tens of terabytes of data per day.
This talk covers these proven design patterns and every design pattern it covers – problem statement, applicability of design pattern, the pattern design, and sample code demonstrating the implementation. Attendees can take advantages of these patterns in building their applications and improve their productivity, quality of solution, as well as the success factor of their applications.
Sheetal Dolas is a principal architect working with Hortonworks with strong expertise in the Hadoop ecosystem and rich field experience. He helps small to large enterprises solve their business problems strategically and functionally as well as at scale by using big data technologies. Sheetal has over 14 years of strong IT experience and has served in key positions as lead big data architect, SOA architect, and technology architect in multiple large and complex enterprise programs. He has extensive knowledge of big data/NoSql technologies including Hadoop, Hive, Pig, HBase, Storm, Kafka etc., and has been working in this space for the last four+ years. He has defined and established EDW architectures for multi petabyte data warehouses on Hadoop platforms, dealt with tens of petabytes of data and thousands of node clusters.
Comments on this page are now closed.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.