The number of data tools has skyrocketed in recent years. These tools are all very powerful, but it can frequently be challenging connecting them together. Connections increase processing time, are frequently single streamed, and are often built on legacy interfaces like ODBC, JDBC, and REST. Building a modern infrastructure requires leveraging these tools together since each part of your organization wants to construct a best-of-breed approach to data science and engineering tools.
Apache Arrow strives to solve part of this problem by allowing these systems to interchange common representations of data through in-process and near-process communications. For distributed and more complex topologies, something better is needed. Enter Arrow Flight.
Arrow Flight is a new initiative within Apache Arrow focused on providing a high-performance protocol and set of libraries for communicating analytical data in large parallel streams. It’s composed of several different implementations and example integrations that allow data engineering organizations to quickly build up data services that can move data between commodity systems at very high speeds.
Jacques Nadeau walks you through the components of Arrow Flight, covering the different ways that types of operations available within Arrow Flight as well as how these operations can be used for different use cases. He then shares several examples of Arrow Flight that are implemented to already provide better integration and performance. Along the way, Jacques also reviews operational considerations, including benchmarking performance and how collaborative backpressure, QOS, stream management, and security are implemented within Arrow Flight, and shares a small example application along with code that can highlight the strength and capabilities of Arrow Flight. He concludes with a discussion of where Arrow Flight is going, opportunities for growth, and how it fits into the concept of data microservices.
Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org