Sep 23–26, 2019

Parquet Modular Encryption: Confidentiality and Integrity of Sensitive Column Data

1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 09
Secondary topics:  Deep dive into specific tools, platforms, or frameworks, Health and Medicine, Privacy and Security

Who is this presentation for?

Systems and security architect, CTO, software engineer




Apache Parquet is a popular columnar format, leveraged in many analytic frameworks for efficient storage and processing of big data. In many real-life use cases, parts of the data are highly sensitive and must be protected. The Parquet community is working on a column encryption mechanism that secures confidentiality and integrity of the sensitive Parquet data, and enables access control for table columns. The modular design of the mechanism preserves the existing projection, predicate pushdown, encoding and compression capabilities of Parquet, required for analytic workload acceleration.

Today, many leading companies in the big data and cloud domains take part in the community work on this technology. The specification of the Parquet modular encryption has been recently completed and formally approved by the Apache Parquet PMC (project management committee).

In this talk, I will present the basics of the columnar encryption technology, its usage model and an initial integration with analytic frameworks (e.g., Apache Spark). I will show two usecases – one related to connected cars (location, speed and other sensitive data), another to healthcare data processing (medical sensor records, managed by the increasingly popular HL7 FHIR standard). I will also describe the performance implications of applying modular encryption in analytic workloads.

Prerequisite knowledge

Basic understanding of big data

What you'll learn

New standard for protection of big data: how it works and how to use in analytics on sensitive information
Photo of Gidon Gershinsky

Gidon Gershinsky


Gidon is a lead architect at the IBM Research – Haifa Laboratory. He works on secure cloud analytics, data-at-rest and data-in-use encryption, attestation of trusted computing enclaves. Currently, Gidon plays a leading role in the Apache Parquet community work on protecting sensitive data in analytic workloads. Gidon has completed a Ph.D degree in the Weizmann Institute of Science in Israel, and was a Post-Doctoral fellow in the Columbia University, NYC.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts