Build resilient systems at scale
October 12–14, 2015 • New York, NY

Robust functional/performance regression analysis @Twitter

Puneet Khanduri (Twitter), Arun Kejariwal (Facebook)
4:35pm–5:15pm Tuesday, 10/13/2015
Location: Beekman Parlor
Average rating: ***..
(3.80, 20 ratings)

Prerequisite Knowledge

The talk shall be self-contained. No prerequisite is required.

Description

Agile development has become a norm nowadays. Though it fosters faster product development cycles, it often results in a higher number of functional and/or performance regressions. In an SOA setting such as Twitter, such regressions may cascade from one service to one or more services. Detecting such regressions manually is not practically feasible in light of the hundreds of services and tens of thousands of metrics each service collects. To this end, we developed a novel tool called Diffy to automatically detect such regressions.

The key highlights of the talk are the following:

  • A simple yet effective approach for detecting functional regressions. False positives are minimized via statistical analysis of metrics obtained from a tuple <primary, secondary, candidate> of nodes, where the same traffic is sent to each node.
  • An ensemble approach to performance regression. The need for an ensemble of classifiers stemmed from the multifaceted characteristics of the performance data. In order to minimize the impact of variability of hardware performance across nodes, we used two clusters – instead of a tuple of nodes – corresponding to the release candidate and production code. The approach is robust against the presence of anomalies in the performance data.

The proposed techniques work well with minute data. Diffy has been in use in production by multiple services at Twitter, and has been baked into the continuous build process so as to actively detect functional and/or performance regressions.

We shall take the audience through how the techniques are being used at Twitter with REAL data.

Photo of Puneet Khanduri

Puneet Khanduri

Twitter

Puneet Khanduri is a senior engineer within the Engineering Effectiveness team at Twitter. His work at Twitter focuses on building tools and frameworks that help other engineering teams build more resilient systems. Prior to Twitter, Puneet worked at Oracle Labs where he built a real-time analytics platform for live sensor data. Puneet has also worked as a researcher at Sun Labs and on network and microprocessor architectures.

Photo of Arun Kejariwal

Arun Kejariwal

Facebook

Arun Kejariwal is a lead engineer at Facebook. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Comments on this page are now closed.

Comments

Tarun kumar
09/16/2015 5:13am EDT

How can I set up diffy on my pc and how can i start test http service calls

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group

Videos

More Videos »

O’Reilly Media

Tech insight, analysis, and research