Build resilient systems at scale
October 12–14, 2015 • New York, NY

Robust functional/performance regression analysis @Twitter

Puneet Khanduri (Sn126), Arun Kejariwal (Independent)
4:35pm–5:15pm Tuesday, 10/13/2015
Location: Beekman Parlor
Average rating: ***..
(3.80, 20 ratings)

Prerequisite Knowledge

The talk shall be self-contained. No prerequisite is required.

Description

Agile development has become a norm nowadays. Though it fosters faster product development cycles, it often results in a higher number of functional and/or performance regressions. In an SOA setting such as Twitter, such regressions may cascade from one service to one or more services. Detecting such regressions manually is not practically feasible in light of the hundreds of services and tens of thousands of metrics each service collects. To this end, we developed a novel tool called Diffy to automatically detect such regressions.

The key highlights of the talk are the following:

  • A simple yet effective approach for detecting functional regressions. False positives are minimized via statistical analysis of metrics obtained from a tuple <primary, secondary, candidate> of nodes, where the same traffic is sent to each node.
  • An ensemble approach to performance regression. The need for an ensemble of classifiers stemmed from the multifaceted characteristics of the performance data. In order to minimize the impact of variability of hardware performance across nodes, we used two clusters – instead of a tuple of nodes – corresponding to the release candidate and production code. The approach is robust against the presence of anomalies in the performance data.

The proposed techniques work well with minute data. Diffy has been in use in production by multiple services at Twitter, and has been baked into the continuous build process so as to actively detect functional and/or performance regressions.

We shall take the audience through how the techniques are being used at Twitter with REAL data.

Photo of Puneet Khanduri

Puneet Khanduri

Sn126

Puneet, most recently lead the ML services team at Twitter where he helped standardize the platform and tooling for ML across the company. During his time at Twitter he also owned services and built OSS tools to help improve engineering productivity.

He recently left Twitter to work on his OSS project full-time. His users include engineers from Twitter, Airbnb, Mixpanel, Baidu, BMW, and many others.

Photo of Arun Kejariwal

Arun Kejariwal

Independent

Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Comments on this page are now closed.

Comments

Tarun kumar
09/16/2015 5:13am EDT

How can I set up diffy on my pc and how can i start test http service calls

Stay Connected

Follow Velocity on Twitter Facebook Group Google+ LinkedIn Group

Videos

More Videos »

O’Reilly Media

Tech insight, analysis, and research