San FranciscoLondon New York

Presented By
O’Reilly + Cloudera

Make Data Work

March 25-28, 2019
San Francisco, CA

Please log in

Add to Your Schedule

Accelerating analytical antelopes: Integrating Apache Kudu's RPC into Apache Impala

Lars Volker (Cloudera), Michael Ho (Cloudera)

11:50am–12:30pm Wednesday, March 27, 2019

Data Engineering & Architecture
Location: 2004

Secondary topics: Streaming, realtime analytics, and IoT

Average rating:

(4.50, 6 ratings)

Who is this presentation for?

Cluster and system admins and distributed software engineers

Level

Advanced

Prerequisite knowledge

Basic knowledge of (distributed) system primitives, threads, processes, and synchronous and asynchronous remote procedure calls

What you'll learn

Learn how Impala has been recently improved to scale out better
Understand why asynchronous, multiplexed, feature-rich RPC frameworks are a key enabler of medium to large-scale applications
See how choosing the right RPC framework can improve scalability by an order of magnitude
Understand why replacing the RPC framework in an existing application is difficult but possible

Description

Since its initial release in 2012, Apache Impala has been deployed on a wide range of cluster sizes. In recent years, deployments grew to sizes where Impala’s RPC layer—based on Apache Thrift RPC—couldn’t keep up. Its synchronous nature and lack of connection multiplexing made Impala consume exorbitant amounts of kernel resources, often leading to instabilities and query failures.

In the past 18 months, Apache Kudu’s RPC framework (KRPC) has been successfully integrated into Impala. Originally developed for the Kudu project, it was built from the ground up to support asynchronous communication between a large number of nodes across multiplexed connections. It also comes with support for TLS and Kerberos.

Lars Volker and Michael Ho discuss Impala’s distributed execution in detail, cover KRPC’s properties, and explain how they integrated KRPC into Impala. Along the way, they demonstrate how it enables Impala to scale beyond its previous limitations and touch on how they consume KRPC as a library to show how other projects looking for a scalable RPC implementation can benefit from their experience.

Lars Volker

Cloudera

Lars Volker is a software engineer at Cloudera. He has worked on various parts of Apache Impala, including crash handling, its Parquet scanners, and scan range scheduling. Most recently, he worked on integrating Kudu’s RPC framework into Impala. Previously, he worked on various databases at SAP.

Michael Ho

Cloudera

Michael Ho is a software engineer at Cloudera. He has worked on various parts of the Apache Impala query execution engine such as reducing codegen time, overhauling expressions evaluation, and most recently, making Impala more scalable. Before Cloudera, Michael used to build hypervisors and VMMs at VMware.

Presented by

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com