Parquet modular encryption: Confidentiality and integrity of sensitive column data
Who is this presentation for?
- Systems and security architects, CTOs, software engineers
Apache Parquet is a popular columnar format, leveraged in many analytic frameworks for efficient storage and processing of big data. In many real-life use cases, parts of the data are highly sensitive and must be protected. The Parquet community is working on a column encryption mechanism that secures confidentiality and integrity of the sensitive Parquet data and enables access control for table columns. The modular design of the mechanism preserves the existing projection, predicate pushdown, encoding, and compression capabilities of Parquet, which are required for analytic workload acceleration.
Many leading companies in the big data and cloud domains are taking part in the community work on this technology. The specification of the Parquet modular encryption has been recently completed and formally approved by the Apache Parquet project management committee (PMC).
Gidon Gershinsky explains the basics of the columnar encryption technology, its usage model, and an initial integration with analytic frameworks (e.g., Apache Spark). He details two use cases—one related to connected cars (location, speed, and other sensitive data), another to healthcare data processing (medical sensor records, managed by the increasingly popular HL7 Fast Healthcare Interoperability Resources (FHIR) standard). And he explores the performance implications of applying modular encryption in analytic workloads.
- A basic understanding of big data
What you'll learn
- Understand new standard for protection of big data: how it works and how to use in analytics on sensitive information
Gidon Gershinsky is a lead architect at IBM Research – Haifa. He works on secure cloud analytics, data-at-rest and data-in-use encryption, and attestation of trusted computing enclaves. Gidon plays a leading role in the Apache Parquet community work on the big data encryption and integrity verification technology. He’s earned a PhD degree at the Weizmann Institute of Science in Israel, and was a post-doctoral fellow at Columbia University.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts