Presto is an open source–distributed SQL engine allowing users to interactively query various data sources, including Hadoop HDFS, object stores such as S3 and Azure Blobs, NoSQL stores like Cassandra, relational databases (MySQL, Postgres, SQLServer, etc.), and even Kafka streams. Presto was originally open sourced by Facebook and is now developed in a healthy open source community, being used in production by all, big and small, regardless of the industry, as long as there are terabytes (or petabytes) of data to query or various data sources to federate. Presto has a proven record as the SQL-on-anything solution in terms of scalability, concurrency, and feature completeness.
Wojciech Biela and Piotr Findeisen offer an overview of Starburst’s Cost-Based Optimizer (CBO) for Presto, which brings a great performance boost. This development is accompanied by a foundation layer—a framework for modeling and calculating data statistics—and is all designed from scratch, with perfect fit to Presto’s architecture and code base, opening a whole new chapter in Presto’s optimizing capabilities.
Wojciech and Piotr walk you through Presto fundamentals and then detail the Cost-Based Optimizer’s concepts and architecture. Along the way, they share the motivating use cases behind this feature as well as the fantastic performance improvements that it brings to Presto users. They conclude by discussing possible future improvements in this area.
Wojciech Biela is a co-founder of Starburst, where he’s responsible for product development. He has over 15 years’ experience building products and running engineering teams. Previously, Wojciech was the engineering manager at the Teradata Center for Hadoop, running the Presto engineering operations in Warsaw, Poland; built and ran the Polish engineering team for a subsidiary of Hadapt, a pioneer in the SQL-on-Hadoop space (acquired by Teradata in 2014); and built and led teams on multiyear projects from custom big ecommerce and SCM platforms to POS systems. Wojciech holds an MS in computer science from the Wroclaw University of Technology.
Piotr Findeisen is a software engineer and a founding member of the team at Starburst. He contributes to the Presto code base and is also active in the community. Piotr has been involved in the design and development of significant features like the Cost-Based Optimizer (still in development), spill to disk, correlated subqueries, and a plethora of smaller enhancements. Previously, Piotr worked at Teradata, where he was the top external Presto committer, and was a team leader at Syncron (a provider of cloud services for supply chain management), responsible for the product’s technical foundation and performance. Piotr holds an MS in computer science and a BSc in mathematics from the University of Warsaw.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com