Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

The Presto Cost-Based Optimizer for interactive SQL on anything

Wojciech Biela (Starburst), Piotr Findeisen (Starburst)
11:1511:55 Wednesday, 1 May 2019
Average rating: ***..
(3.12, 8 ratings)

Who is this presentation for?

  • Engineers, DevOps engineers, and those on the business side

Level

Beginner

Prerequisite knowledge

  • A basic understanding of Hadoop, HDFS, and SQL

What you'll learn

  • Explore the Presto Cost-Based Optimizer, its benefits, and its use cases

Description

Presto is an open source–distributed SQL engine allowing users to interactively query various data sources, including Hadoop HDFS, object stores such as S3 and Azure Blobs, NoSQL stores like Cassandra, relational databases (MySQL, Postgres, SQLServer, etc.), and even Kafka streams. Presto was originally open sourced by Facebook and is now developed in a healthy open source community, being used in production by all, big and small, regardless of the industry, as long as there are terabytes (or petabytes) of data to query or various data sources to federate. Presto has a proven record as the SQL-on-anything solution in terms of scalability, concurrency, and feature completeness.

Wojciech Biela and Piotr Findeisen offer an overview of Starburst’s Cost-Based Optimizer (CBO) for Presto, which brings a great performance boost. This development is accompanied by a foundation layer—a framework for modeling and calculating data statistics—and is all designed from scratch, with perfect fit to Presto’s architecture and code base, opening a whole new chapter in Presto’s optimizing capabilities.

Wojciech and Piotr walk you through Presto fundamentals and then detail the Cost-Based Optimizer’s concepts and architecture. Along the way, they share the motivating use cases behind this feature as well as the fantastic performance improvements that it brings to Presto users. They conclude by discussing possible future improvements in this area.

Photo of Wojciech Biela

Wojciech Biela

Starburst

Wojciech Biela is a co-founder of Starburst, where he’s responsible for product development. He has over 15 years’ experience building products and running engineering teams. Previously, Wojciech was the engineering manager at the Teradata Center for Hadoop, running the Presto engineering operations in Warsaw, Poland; built and ran the Polish engineering team for a subsidiary of Hadapt, a pioneer in the SQL-on-Hadoop space (acquired by Teradata in 2014); and built and led teams on multiyear projects from custom big ecommerce and SCM platforms to POS systems. Wojciech holds an MS in computer science from the Wroclaw University of Technology.

Photo of Piotr Findeisen

Piotr Findeisen

Starburst

Piotr Findeisen is a software engineer and a founding member of the team at Starburst. He contributes to the Presto code base and is also active in the community. Piotr has been involved in the design and development of significant features like the Cost-Based Optimizer (still in development), spill to disk, correlated subqueries, and a plethora of smaller enhancements. Previously, Piotr worked at Teradata, where he was the top external Presto committer, and was a team leader at Syncron (a provider of cloud services for supply chain management), responsible for the product’s technical foundation and performance. Piotr holds an MS in computer science and a BSc in mathematics from the University of Warsaw.