Workday is on a mission to provide strong privacy and security for ML products built with customer data. With HR and financial data from half of the Fortune 500 companies, privacy and security is extremely important. A key enabler of the company’s mission is an architecture guided by the principles of privacy by design and data protection by default. Two important aspects of privacy by design and regulations such as GDPR or the California Consumer Privacy Act (CCPA) are clearly and explicitly stating what data the customer is opting to allow access and what purpose it will be used for and enforcing access control and tracking lineage to ensure that data is accessed by authorized users and used for only authorized purposes. LN Renganarayana shares the architectural design and the pragmatic trade-offs Workday has made to implement the second aspect.
The fundamental building block is the abstraction of a dataset and the operations publish and checkout. This abstraction coupled with metadata capture and tracking provide the basis for granular versioning and lineage tracking—both important for ensuring data collected for a purpose is used only for that purpose. Role-based access control on datasets enables granular dataset level control to ensure only authorized users have access to the datasets. Further, the publish and checkout operations on datasets turn out to be good leverage points for transparent and always-on security (encryption). Workday’s ML platform implements these abstractions to provide a high-productivity environment for data scientists. Time to market is as important of a goal as privacy by design. Workday’s architecture optimizes for time to market by minimizing undifferentiated heavy lifting via using AWS services and open source software, balancing these two goals and providing a pragmatic solution by leveraging AWS ML tools such as Sagemaker, Service catalog, and MxNet to enable data scientist productivity and S3, IAM, and Lambda for implementing access control and security.
LN outlines the privacy-related requirements and describes an architecture that helps build privacy-preserving ML products without sacrificing time to market. The design is implemented across AWS and Workday data centers and is used by production ML products. He uses a couple of production ML products to illustrate the architecture—a deep learning-based OCR system and a knowledge graph for skills built using word embeddings. If you are a believer in privacy preserving data/ML products or if you are looking for ideas to implement requirements from GPDR or CCPA, you’ll find this presentation useful and will walk away with a few pragmatic solutions that you can try out for your data products.
LN is a technology leader who has helped bring to life several data and ML products over the past 15 years. As the Head of Data Science and ML Architecture at Workday, he is helping Workday build cutting edge Enterprise AI products on a secure ML Platform. He believes in ethical AI development and regularly partners with Privacy, Legal and Security experts to codify complex compliance rules into concrete software services. In the past, he was a researcher at IBM T.J. Watson research center, where he built cloud anomaly detectors and ML driven compiler optimizations and was a Director at Symantec, where he helped build a real-time streaming analytics service.
LN holds a Ph. D. in computer science with 40+ publications/patents and his work has had strong innovation and business impacts with awards from ACM, IBM, and HP. Outside of his passion for technology, LN loves chocolates, will drive miles for a good coffee, and an avid practitioner of meditation.
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org