Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Taming the ever-evolving compliance beast: Lessons learned at LinkedIn

Shirshanka Das (LinkedIn), Tushar Shanbhag (LinkedIn)
2:55pm3:35pm Thursday, September 28, 2017
Data Engineering & Architecture, Law, ethics, governance
Location: 1A 01/02 Level: Intermediate
Secondary topics:  Media

Who is this presentation for?

  • Privacy experts and big data practitioners

Prerequisite knowledge

  • An understanding of privacy and compliance regulations and the Hadoop and Kafka ecosystem, especially with regard to encryption, access control, and compliance

What you'll learn

  • Learn how LinkedIn protects member privacy in its scalable distributed data ecosystem built around Kafka and Hadoop


Just when you think you have your Kafka and Hadoop clusters set up and humming and you’re well on your path to democratizing data, you realize that you now have a very different set of challenges to solve. You want to provide unfettered access to data to your data scientists, but at the same time, you need to preserve the privacy of your members, who have entrusted you with their data.

Shirshanka Das and Tushar Shanbhag outline the path LinkedIn has taken to protect member privacy in its scalable distributed data ecosystem built around Kafka and Hadoop. Like most companies, in the early days, LinkedIn’s first priority was getting data flowing freely and reliably. Over the past few years, the company has made significant advances in data governance, going above and beyond expectations with regard to the commitments it has made to members in how it handles their data.

Shirshanka and Tushar share how LinkedIn handled the Irish Data Protection Commissioner’s requirements for ensuring that member data was purged from all data systems including Hadoop within the required timeframe and the kind of systems the company had to build to solve it. They also discuss three foundational building blocks for scalable data management that can meet data compliance regulations: a centralized metadata system, a standardized data movement platform, and a unified data access layer. Some of these systems are open source and can be of use to companies that are in a similar situation. Along the way, they also look to the future—specifically, to the General Data Protection Regulation, which comes into effect in 2018—and outline LinkedIn’s plans for addressing those requirements.

But technology is just part of the solution. Shirshanka and Tushar also share the culture and process change they’ve seen happen at the company and the lessons they’ve learned about sustainable process and governance.

Photo of Shirshanka Das

Shirshanka Das


Shirshanka Das is a principal staff software engineer and the architect for LinkedIn’s analytics platforms and applications team. He was among the original authors of a variety of open and closed source projects built at LinkedIn, including Databus, Espresso, and Apache Helix. He’s working with his team to simplify the big data analytics space at LinkedIn through a multitude of mostly open source projects, including Pinot, a high-performance distributed OLAP engine; Gobblin, a data lifecycle management platform for Hadoop; WhereHows, a data discovery and lineage platform; and Dali, a data virtualization layer for Hadoop.

Photo of Tushar Shanbhag

Tushar Shanbhag


Tushar Shanbhag is head of data strategy and data products at LinkedIn. Tushar is a seasoned executive with track record of building high-growth businesses at market-defining companies such as LinkedIn, Cloudera, VMware, and Microsoft. Most recently, Tushar was vice president of products and design at Arimo, an Andreessen-Horowitz company building data intelligence products using analytics and AI.

Comments on this page are now closed.


Picture of Shirshanka Das
10/01/2017 8:38am EDT

Slides will be linked here shortly.

They’re also up on slideshare at:

Maxwell Goldbas | DATA ENGINEER
09/29/2017 9:43am EDT

Is there a link to the presentation from this session? It was extremely helpful