Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

21–22 May 2018: Training
22–24 May 2018: Tutorials & Conference
London, UK

Setting up a lightweight distributed caching layer using Apache Arrow

Jacques Nadeau (Dremio)

12:05–12:45 Thursday, 24 May 2018

Big data and data science in the cloud, Data engineering and architecture
Location: S11A Level: Advanced

Average rating:

(4.00, 3 ratings)

Who is this presentation for?

Data consumers, data scientists, data engineers, and developers

What you'll learn

Learn how Apache Arrow can speed access to data for multiple purposes

Description

Apache Arrow has quickly become the standard for high-performance in-memory processing. It has integration with major open source projects such as Spark, pandas, Parquet, Dremio, libgdf, and the GPU Open Analytics Initiative (GOAI). As the go-to representation for data processing and interchange, Arrow has substantially changed how well systems can share and process data. However, systems today only generate Arrow representation data ephemerally. The translation from on-disk formats to Arrow can diminish the overall performance potential.

Jacques Nadeau offers an overview of a new Apache-licensed lightweight distributed in-memory cache that allows multiple applications to consume Arrow directly using the Arrow RPC and IPC protocols. You’ll explore the system design and deployment architecture, including the cache lifecycle, update patterns, cache cohesion, and appropriate use cases; learn how data science, analytical, and custom applications can all leverage the cache simultaneously; discover the trade-offs around in-memory representations, data size, and balancing working memory with cache overhead; explore security, upgrades, and versioning, with a focus on how to balance performance, access, and governance; and see a live demo, showing the impact on overall performance and end-user satisfaction.

Jacques Nadeau

Dremio

Jacques Nadeau is the cofounder and CTO of Dremio. Previously, he ran MapR’s distributed systems team; was CTO and cofounder of YapMap, an enterprise search startup; and held engineering leadership roles at Quigo, Offermatica, and aQuantive. Jacques is cocreator and PMC chair of Apache Arrow, a PMC member of Apache Calcite, a mentor for Apache Heron, and the founding PMC chair of the open source Apache Drill project.

Website

Presented by

Elite Sponsors

Exabyte Sponsor

Impact Sponsors

Supporting Sponsor

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com