Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA

Loosely coupled data with Apache Arrow Flight

Jacques Nadeau (Dremio)
1:50pm2:30pm Thursday, March 28, 2019

Who is this presentation for?

  • Data architects, data engineers, BI architects, and data scientists

Level

Intermediate

Prerequisite knowledge

  • Familiarity with SQL, analytics, and cloud services for data storage and compute

What you'll learn

  • Explore Apache Arrow Flight, a new way to exchange and analyze data between systems using an optimal format and libraries for CPU/GPU and RAM efficiency

Description

The number of data tools has skyrocketed in recent years. These tools are all very powerful, but it can frequently be challenging connecting them together. Connections increase processing time, are frequently single streamed, and are often built on legacy interfaces like ODBC, JDBC, and REST. Building a modern infrastructure requires leveraging these tools together since each part of your organization wants to construct a best-of-breed approach to data science and engineering tools.

Apache Arrow strives to solve part of this problem by allowing these systems to interchange common representations of data through in-process and near-process communications. For distributed and more complex topologies, something better is needed. Enter Arrow Flight.

Arrow Flight is a new initiative within Apache Arrow focused on providing a high-performance protocol and set of libraries for communicating analytical data in large parallel streams. It’s composed of several different implementations and example integrations that allow data engineering organizations to quickly build up data services that can move data between commodity systems at very high speeds.

Jacques Nadeau walks you through the components of Arrow Flight, covering the different ways that types of operations available within Arrow Flight as well as how these operations can be used for different use cases. He then shares several examples of Arrow Flight that are implemented to already provide better integration and performance. Along the way, Jacques also reviews operational considerations, including benchmarking performance and how collaborative backpressure, QOS, stream management, and security are implemented within Arrow Flight, and shares a small example application along with code that can highlight the strength and capabilities of Arrow Flight. He concludes with a discussion of where Arrow Flight is going, opportunities for growth, and how it fits into the concept of data microservices.

Photo of Jacques Nadeau

Jacques Nadeau

Dremio

Jacques Nadeau is the CTO and cofounder of Dremio. Jacques is also the founding PMC chair of the open source Apache Drill project, spearheading the project’s technology and community. Previously, he was the architect and engineering manager for Drill and other distributed systems technologies at MapR; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)