As companies continue to embrace modern architectures based on microservices and cloud applications, it has become increasingly difficult to physically consolidate all data into a single system. In a world where data is extremely fragmented and users expect instant gratification, the age-old approach of constructing and maintaining ETL pipelines can be prohibitively cumbersome and expensive.
Apache Arrow is an open source project, initiated by over a dozen open source communities, that provides a standard columnar in-memory data representation and processing framework. Arrow has emerged as a popular way way to handle in-memory data for analytical purposes. In the last year, Arrow has been embedded into a broad range of open source (and commercial) technologies, including GPU databases, machine learning libraries and tools, execution engines, and visualization frameworks (e.g., Anaconda, Dremio, Graphistry, H2O, MapD, pandas, R, and Spark).
Tomer Shiran offers an overview of Arrow, shows how companies can utilize Arrow to enable users to access and analyze data across disparate data sources without having to physically consolidate it into a centralized data repository, and explains how several open source projects are utilizing it to achieve high-performance data processing and interoperability across systems. Along the way, Tomer shares examples such as a 50x speedup in PySpark (Spark-pandas interoperability) and a join between Parquet files on S3, Oracle tables, and Elasticsearch indices. Tomer concludes by outlining Apache Arrow’s 12-month roadmap.
Tomer Shiran is cofounder and CEO of Dremio, the data lake engine company. Previously, Tomer was the vice president of product at MapR, where he was responsible for product strategy, road map, and new feature development and helped grow the company from 5 employees to over 300 employees and 700 enterprise customers; and he held numerous product management and engineering positions at Microsoft and IBM Research. He’s the author of eight US patents. Tomer holds an MS in electrical and computer engineering from Carnegie Mellon University and a BS in computer science from the Technion, the Israel Institute of Technology.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com