Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Arun Kejariwal

Arun Kejariwal
Lead Engineer, Independent

@arun_kejariwal

Arun Kejariwal is an independent lead engineer. Previously, he was he was a statistical learning principal at Machine Zone (MZ), where he led a team of top-tier researchers and worked on research and development of novel techniques for install-and-click fraud detection and assessing the efficacy of TV campaigns and optimization of marketing campaigns, and his team built novel methods for bot detection, intrusion detection, and real-time anomaly detection; and he developed and open-sourced techniques for anomaly detection and breakout detection at Twitter. His research includes the development of practical and statistically rigorous techniques and methodologies to deliver high performance, availability, and scalability in large-scale distributed clusters. Some of the techniques he helped develop have been presented at international conferences and published in peer-reviewed journals.

Sessions

13:3017:00 Tuesday, 30 April 2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio), Ivan Kelly (Streamlio)
Average rating: ***..
(3.00, 10 ratings)
Many industry segments have been grappling with fast data (high-volume, high-velocity data). Arun Kejariwal and Karthik Ramasamy walk you through the state-of-the-art systems for each stage of an end-to-end data processing pipeline—messaging, compute, and storage—for real-time data and algorithms to extract insights (e.g., heavy hitters and quantiles) from data streams. Read more.
12:0512:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 17
Arun Kejariwal (Independent), Ira Cohen (Anodot)
Average rating: ****.
(4.00, 5 ratings)
Sequence-to-sequence modeling (seq2seq) is now being used for applications based on time series data. Arun Kejariwal and Ira Cohen offer an overview seq2seq and explore its early use cases. They then walk you through leveraging seq2seq modeling for these use cases, particularly with regard to real-time anomaly detection and forecasting. Read more.
14:5515:35 Wednesday, 1 May 2019
Arun Kejariwal (Independent), Karthik Ramasamy (Streamlio)
Average rating: ***..
(3.00, 1 rating)
Arun Kejariwal and Karthik Ramasamy walk you through an architecture in which models are served in real time and the models are updated, using Apache Pulsar, without restarting the application at hand. They then describe how to apply Pulsar functions to support two example use—sampling and filtering—and explore a concrete case study of the same. Read more.