Building a real-time monitoring service that handles millions of custom events per second while satisfying complex rules, varied throughput requirements, and numerous dimensions simultaneously is a complex endeavor. Sumeet Singh and Mridul Jain explain how Yahoo approached these challenges with Apache Storm Trident, Kafka, HBase, and OpenTSDB and discuss the lessons learned along the way.
Sumeet and Mridul explain scaling patterns backed by real scenarios and data to help attendees develop their own architectures and strategies for dealing with the scale challenges that come with real-time big data systems. They also explore the tradeoffs made in catering to a diverse set of daily users and the associated usability challenges that motivated Yahoo to build a self-serve, easy-to-use platform that requires minimal programming experience. Sumeet and Mridul then discuss event-level tracking for debugging and troubleshooting problems that our users may encounter at this scale. Over the course of their talk, they also address building infrastructure and operational intelligence with anomaly detection, alert correlation, and trend analysis based on the monitoring platform.
Sumeet Singh is a senior director of product management for cloud and big data platforms at Yahoo. In his current role, he leads the Hadoop products team responsible for both Apache open source contributions and Yahoo projects. Sumeet is responsible for introducing several new multitenant cloud services at Yahoo that are now the cornerstone of most of Yahoo’s next-generation consumer product offerings and user experiences. Sumeet has 16 years of experience in product management and software development in the technology industry. He earned an MBA from the UCLA Anderson School of Management and an MS from Rensselaer Polytechnic Institute.
Mridul Jain is a senior principal architect for Yahoo’s monitoring platform. He has been using Storm and Kafka to solve various real-time problems at Yahoo for almost three years. Mridul is also the author of Pig on Storm. His interests are mostly in the area of real-time stream processing and machine learning. His prior roles include real-time search trend analysis using language models as well as cloud platform architecture.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.