July 12-13, 2017: Training
July 13-15, 2017: Tutorials & Conference
Beijing, China

ShadowMask: 脱敏你的敏感的大数据 (ShadowMask: Anonymize your sensitive big data)

此演讲使用中文 (This will be presented in Chinese)

李银辉 (万达网络科技集团), 千惠子 (万达网络科技集团)
16:20–17:00 Friday, 2017-07-14
安全 (Security)
Location: 多功能厅5B+C(Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)
数据脱敏的基本概念。 大数据脱敏平台的架构设计。 如何对大数据进行数据脱敏。

数据安全对于大数据平台来说至关重要,而数据泄露是数据安全的关键风险之一。支付卡行业数据安全标准(PCI-DSS)、金融现代化法案(GLBA)、BASEL II、欧盟个人数据保护指令、HIPAA 以及其他隐私保护法规均要求组织基于其用户的业务职能限制数据访问权限,以尽量减少敏感信息及个人信息的暴露和盗窃。

ShadowMask是开源的分布式数据脱敏项目,为Apache Hadoop等分布式数据平台上的超大规模数据集提供数据脱敏的能力。ShadowMask提供了功能丰富的数据脱敏API,用户可以根据特定需求,协调隐私数据保护与数据分析挖掘需求之间的平衡。



1. 数据脱敏的基本概念:脱敏规则,泄露风险模型,信息丢失模型,基于规则/风险/信息的数据脱敏算法等。
2. ShadowMask基本架构。
3. ShadowMask在实际项目中的应用实践,遇到的问题及挑战。

Data security is crucial for a big data platform. Meanwhile, data leakage is one of the major risks of data security. PCI-DSS, GLBA, BASEL II, the EU Data Protection Directive, HIPAA, and other privacy protection regulations all require organizations to restrict data access based on their users’ business functions to minimize exposure and theft of sensitive and personal information.

李银辉 and 千惠子 offer an overview of ShadowMask, an open source project for distributed data anonymization that provides data masking capabilities for ultralarge datasets on distributed data platforms such as Apache Hadoop. ShadowMask provides a feature-rich data anonymization API that allows users to balance their privacy data protection needs and data analysis and mining needs based on specific requirements.

So what requirements does ShadowMask really address? How can you integrate it with other security components in a big data platform? How does it anonymize data based on distributed engines? And how do you find an optimal anonymization solution? This session will answer these questions one by one, based on solid cases, walk you through privacy data protection in practice, and offer a glimpse of future possibilities.

