The number of data tools has skyrocketed in recent years. These tools are all very powerful, but it can frequently be challenging connecting them together. Connections increase processing time, are frequently single streamed, and are often built on legacy interfaces like ODBC, JDBC, and REST. Building a modern infrastructure requires leveraging these tools together since each part of your organization wants to construct a best-of-breed approach to data science and engineering tools.
Apache Arrow strives to solve part of this problem by allowing these systems to interchange common representations of data through in-process and near-process communications. For distributed and more complex topologies, something better is needed. Enter Arrow Flight.
Arrow Flight is a new initiative within Apache Arrow focused on providing a high-performance protocol and set of libraries for communicating analytical data in large parallel streams. It’s composed of several different implementations and example integrations that allow data engineering organizations to quickly build up data services that can move data between commodity systems at very high speeds.
Jacques Nadeau walks you through the components of Arrow Flight, covering the different ways that types of operations available within Arrow Flight as well as how these operations can be used for different use cases. He then shares several examples of Arrow Flight that are implemented to already provide better integration and performance. Along the way, Jacques also reviews operational considerations, including benchmarking performance and how collaborative backpressure, QOS, stream management, and security are implemented within Arrow Flight, and shares a small example application along with code that can highlight the strength and capabilities of Arrow Flight. He concludes with a discussion of where Arrow Flight is going, opportunities for growth, and how it fits into the concept of data microservices.
Jacques Nadeau is the CTO and cofounder of Dremio. Jacques is also the founding PMC chair of the open source Apache Drill project, spearheading the project’s technology and community. Previously, he was the architect and engineering manager for Drill and other distributed systems technologies at MapR; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org