Sep 23–26, 2019

Parquet modular encryption: Confidentiality and integrity of sensitive column data

1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 09

Who is this presentation for?

  • Systems and security architects, CTOs, software engineers

Level

Intermediate

Description

Apache Parquet is a popular columnar format, leveraged in many analytic frameworks for efficient storage and processing of big data. In many real-life use cases, parts of the data are highly sensitive and must be protected. The Parquet community is working on a column encryption mechanism that secures confidentiality and integrity of the sensitive Parquet data and enables access control for table columns. The modular design of the mechanism preserves the existing projection, predicate pushdown, encoding, and compression capabilities of Parquet, which are required for analytic workload acceleration.

Many leading companies in the big data and cloud domains are taking part in the community work on this technology. The specification of the Parquet modular encryption has been recently completed and formally approved by the Apache Parquet project management committee (PMC).

Gidon Gershinsky explains the basics of the columnar encryption technology, its usage model, and an initial integration with analytic frameworks (e.g., Apache Spark). He details two use cases—one related to connected cars (location, speed, and other sensitive data), another to healthcare data processing (medical sensor records, managed by the increasingly popular HL7 Fast Healthcare Interoperability Resources (FHIR) standard). And he explores the performance implications of applying modular encryption in analytic workloads.

Prerequisite knowledge

  • A basic understanding of big data

What you'll learn

  • Understand new standard for protection of big data: how it works and how to use in analytics on sensitive information
Photo of Gidon Gershinsky

Gidon Gershinsky

IBM

Gidon Gershinsky is a lead architect at IBM Research – Haifa. He works on secure cloud analytics, data-at-rest and data-in-use encryption, and attestation of trusted computing enclaves. Gidon plays a leading role in the Apache Parquet community work on the big data encryption and integrity verification technology. He’s earned a PhD degree at the Weizmann Institute of Science in Israel, and was a post-doctoral fellow at Columbia University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts