Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Presto query gate: Identifying and stopping rogue queries

Ritesh Agrawal (Uber), Anirban Deb (Uber)

2:40pm–3:20pm Wednesday, March 7, 2018

Big data and data science in the cloud, Data engineering and architecture
Location: 230 C

Who is this presentation for?

Data scientists, engineering managers, platform managers, and platform engineers

Prerequisite knowledge

A basic understanding of Hadoop, Presto, and machine learning concepts (e.g., trees, XGBoost, and deep learning)

What you'll learn

Learn how Uber uses machine learning to identify and stop rogue queries in Presto, saving both computational power and money

Description

In today’s analytics-driven world, the challenge is not gathering the data itself but who can find actionable insights from petabytes of data quickly and efficiently. In recent years, Presto has taken on this challenge with fast, efficient in-memory computation that makes querying petabytes of data an almost real-time experience. Nevertheless all systems have certain limitations. For Presto, these system constraints are often expressed in terms of max run time and memory and CPU limits.

Generally, a very small portion of SQL queries (usually less than 0.5%) really hit these system constraints, but these queries end up consuming a significant amount of compute resources, reducing Presto’s overall computational efficiency, increasing query latency, and lowering overall query throughput.

Ritesh Agrawal and Anirban Deb explain how Uber uses machine learning to identify and stop rogue queries, saving both computational power and money. Uber has developed and deployed an innovative two-phase solution to make its model fast and intelligent. The solution has a less than 0.5% false positive rate and saves more than 40% of otherwise wasted computational resources. At Uber’s scale, this has not only fueled quicker analytics but also saved thousands of dollars and led to longer and better utilization of existing computational resources.

Ritesh Agrawal

Uber

Ritesh Agrawal leads the intelligent infrastructure systems team at Uber, which focuses on scaling data infrastructure for Uber’s growing business needs now and foreseeable in the future. A leading data scientist for optimizing infrastructure, previously, Ritesh specialized in predictive and ranking models at Netflix, AT&T Labs, and Yellow Pages, where he built scalable machine learning infrastructure with technologies such as Docker, Hadoop, and Spark. He holds a PhD in environmental earth science from Pennsylvania State University, where his thesis focused on computational tools and technologies such as concept map ontologies.

Website

Anirban Deb

Uber

Anirban Deb is a data science manager at Uber. A seasoned data science and analytics leader, Anirban has extensive experience building and managing high-performing teams to support strategic decision making, business analytics, marketing analytics, product analytics, predictive modeling, reporting, and executive communication.

Website

Comments on this page are now closed.

Comments

Saju Joseph |

03/15/2018 9:41pm PDT

Any chance to share the presentation? Also will you be making this feature available for others to use?
Thanks

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com