In today’s analytics-driven world, the challenge is not gathering the data itself but who can find actionable insights from petabytes of data quickly and efficiently. In recent years, Presto has taken on this challenge with fast, efficient in-memory computation that makes querying petabytes of data an almost real-time experience. Nevertheless all systems have certain limitations. For Presto, these system constraints are often expressed in terms of max run time and memory and CPU limits.
Generally, a very small portion of SQL queries (usually less than 0.5%) really hit these system constraints, but these queries end up consuming a significant amount of compute resources, reducing Presto’s overall computational efficiency, increasing query latency, and lowering overall query throughput.
Ritesh Agrawal and Anirban Deb explain how Uber uses machine learning to identify and stop rogue queries, saving both computational power and money. Uber has developed and deployed an innovative two-phase solution to make its model fast and intelligent. The solution has a less than 0.5% false positive rate and saves more than 40% of otherwise wasted computational resources. At Uber’s scale, this has not only fueled quicker analytics but also saved thousands of dollars and led to longer and better utilization of existing computational resources.
Ritesh Agrawal leads the intelligent infrastructure systems team at Uber, which focuses on scaling data infrastructure for Uber’s growing business needs now and foreseeable in the future. A leading data scientist for optimizing infrastructure, previously, Ritesh specialized in predictive and ranking models at Netflix, AT&T Labs, and Yellow Pages, where he built scalable machine learning infrastructure with technologies such as Docker, Hadoop, and Spark. He holds a PhD in environmental earth science from Pennsylvania State University, where his thesis focused on computational tools and technologies such as concept map ontologies.
Anirban Deb is a data science manager at Uber. A seasoned data science and analytics leader, Anirban has extensive experience building and managing high-performing teams to support strategic decision making, business analytics, marketing analytics, product analytics, predictive modeling, reporting, and executive communication.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
Any chance to share the presentation? Also will you be making this feature available for others to use?
Thanks