Sep 23–26, 2019
Please log in

Parquet modular encryption: Confidentiality and integrity of sensitive column data

1:15pm1:55pm Wednesday, September 25, 2019
Location: 1E 09
Average rating: ***..
(3.50, 4 ratings)

Who is this presentation for?

  • Systems and security architects, CTOs, software engineers




Apache Parquet is a popular columnar format, leveraged in many analytic frameworks for efficient storage and processing of big data. In many real-life use cases, parts of the data are highly sensitive and must be protected. The Parquet community is working on a column encryption mechanism that secures confidentiality and integrity of the sensitive Parquet data and enables access control for table columns. The modular design of the mechanism preserves the existing projection, predicate pushdown, encoding, and compression capabilities of Parquet, which are required for analytic workload acceleration.

Many leading companies in the big data and cloud domains are taking part in the community work on this technology. The specification of the Parquet modular encryption has been recently completed and formally approved by the Apache Parquet project management committee (PMC).

Gidon Gershinsky explains the basics of the columnar encryption technology, its usage model, and an initial integration with analytic frameworks (e.g., Apache Spark). He details two use cases—one related to connected cars (location, speed, and other sensitive data), another to healthcare data processing (medical sensor records, managed by the increasingly popular HL7 Fast Healthcare Interoperability Resources (FHIR) standard). And he explores the performance implications of applying modular encryption in analytic workloads.

Prerequisite knowledge

  • A basic understanding of big data

What you'll learn

  • Understand new standard for protection of big data: how it works and how to use in analytics on sensitive information
Photo of Gidon Gershinsky

Gidon Gershinsky


Gidon Gershinsky is a lead architect at IBM Research – Haifa. He works on secure cloud analytics, data-at-rest and data-in-use encryption, and attestation of trusted computing enclaves. Gidon plays a leading role in the Apache Parquet community work on the big data encryption and integrity verification technology. He’s earned a PhD degree at the Weizmann Institute of Science in Israel, and was a post-doctoral fellow at Columbia University.

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires