Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Architecting for change: LinkedIn's new data ecosystem

Shirshanka Das (LinkedIn), Yael Garten (LinkedIn)
2:55pm–3:35pm Wednesday, 09/28/2016
Data-driven business
Location: 3D 12 Level: Intermediate
Average rating: ****.
(4.73, 11 ratings)

Prerequisite knowledge

  • A working knowledge of how Internet-enabled businesses with web and mobile applications perform application tracking, metrics computation, and reporting
  • An understanding of Hadoop’s capabilities in functioning as a data warehouse
  • What you'll learn

  • Explore LinkedIn's new data ecosystem, which creates clear contracts between data producers and consumers and enables the product to innovate without painful migrations for downstream data consumers
  • Get concrete suggestions around product tracking, metrics computation, reporting, and setting up a collaborative data governance process that brings together engineering, product, and data science teams for sustainable, accurate, and reliable data tracking
  • Description

    Last year, LinkedIn embarked on an ambitious mission to completely revamp the mobile experience for its members. This would mean a completely new mobile application, reimagined user experiences, and new interaction concepts. As the team evaluated the impact of this big rewrite on the data analytics ecosystem, they observed a few problems.

    Over the past few years, LinkedIn has become extremely good at incrementally changing the site one mini-feature at a time, often in conjunction with hundreds of other incremental changes. LinkedIn’s experimentation platform ensures that it is always monitoring a wide gamut of impacted metrics with every change before rolling fully forward. However, when it comes to rolling out a big change like this, different challenges crop up. You have to rollout the entire application all at once; the new experience means that you have no baseline on new metrics; and existing metrics may see double digit changes just because of the new experience or because the metric’s logic is no longer accurate—the challenge is in figuring out which is which.

    Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.

    Photo of Shirshanka Das

    Shirshanka Das


    Shirshanka Das is a principal staff software engineer and the architect for LinkedIn’s analytics platforms and applications team. He was among the original authors of a variety of open and closed source projects built at LinkedIn, including Databus, Espresso, and Apache Helix. He’s working with his team to simplify the big data analytics space at LinkedIn through a multitude of mostly open source projects, including Pinot, a high-performance distributed OLAP engine; Gobblin, a data lifecycle management platform for Hadoop; WhereHows, a data discovery and lineage platform; and Dali, a data virtualization layer for Hadoop.

    Photo of Yael Garten

    Yael Garten


    Yael Garten is director of data science at LinkedIn, where she leads a team that focuses on understanding and increasing growth and engagement of LinkedIn’s 400 million members across mobile and desktop consumer products. Yael is an expert at converting data into actionable product and business insights that impact strategy. Her team partners with product, engineering, design, and marketing to optimize the LinkedIn user experience, creating powerful data-driven products to help LinkedIn’s members be productive and successful. Yael champions data quality at LinkedIn; she has devised organizational best practices for data quality and developed internal data tools to democratize data within the company. Yael also advises companies on informatics methodologies to transform high-throughput data into insights and is a frequent conference speaker. She holds a PhD in biomedical informatics from the Stanford University School of Medicine, where her research focused on information extraction via natural language processing to understand how human genetic variations impact drug response, and an MSc from the Weizmann Institute of Science in Israel.

    Comments on this page are now closed.


    Picture of André Morrow
    André Morrow
    10/04/2016 1:05pm EDT

    Please see link above.

    10/03/2016 12:04pm EDT

    Thanks a lot!

    Picture of Shirshanka Das
    Shirshanka Das
    10/03/2016 8:37am EDT

    Slides have been sent to the conference organizers, so they should be up soon. They are also up on slideshare here:

    10/03/2016 6:27am EDT

    Can I please get the slides for this session? Thanks.