Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Lessons learned building a scalable self-serve, real-time, multitenant monitoring service at Yahoo

Sumeet Singh (Yahoo), Mridul Jain (Yahoo)
11:50am–12:30pm Thursday, 03/31/2016
Data Innovations

Location: 230 C
Average rating: ***..
(3.80, 5 ratings)

Prerequisite knowledge

Attendees should have a basic understanding of big data systems and know the importance of real-time and monitoring systems.


Building a real-time monitoring service that handles millions of custom events per second while satisfying complex rules, varied throughput requirements, and numerous dimensions simultaneously is a complex endeavor. Sumeet Singh and Mridul Jain explain how Yahoo approached these challenges with Apache Storm Trident, Kafka, HBase, and OpenTSDB and discuss the lessons learned along the way.

Sumeet and Mridul explain scaling patterns backed by real scenarios and data to help attendees develop their own architectures and strategies for dealing with the scale challenges that come with real-time big data systems. They also explore the tradeoffs made in catering to a diverse set of daily users and the associated usability challenges that motivated Yahoo to build a self-serve, easy-to-use platform that requires minimal programming experience. Sumeet and Mridul then discuss event-level tracking for debugging and troubleshooting problems that our users may encounter at this scale. Over the course of their talk, they also address building infrastructure and operational intelligence with anomaly detection, alert correlation, and trend analysis based on the monitoring platform.

Photo of Sumeet Singh

Sumeet Singh


Sumeet Singh is a senior director of product management for cloud and big data platforms at Yahoo. In his current role, he leads the Hadoop products team responsible for both Apache open source contributions and Yahoo projects. Sumeet is responsible for introducing several new multitenant cloud services at Yahoo that are now the cornerstone of most of Yahoo’s next-generation consumer product offerings and user experiences. Sumeet has 16 years of experience in product management and software development in the technology industry. He earned an MBA from the UCLA Anderson School of Management and an MS from Rensselaer Polytechnic Institute.

Photo of Mridul Jain

Mridul Jain


Mridul Jain is a senior principal architect for Yahoo’s monitoring platform. He has been using Storm and Kafka to solve various real-time problems at Yahoo for almost three years. Mridul is also the author of Pig on Storm. His interests are mostly in the area of real-time stream processing and machine learning. His prior roles include real-time search trend analysis using language models as well as cloud platform architecture.