Last year, LinkedIn embarked on an ambitious mission to completely revamp the mobile experience for its members. This would mean a completely new mobile application, reimagined user experiences, and new interaction concepts. As the team evaluated the impact of this big rewrite on the data analytics ecosystem, they observed a few problems.
Over the past few years, LinkedIn has become extremely good at incrementally changing the site one mini-feature at a time, often in conjunction with hundreds of other incremental changes. LinkedIn’s experimentation platform ensures that it is always monitoring a wide gamut of impacted metrics with every change before rolling fully forward. However, when it comes to rolling out a big change like this, different challenges crop up. You have to rollout the entire application all at once; the new experience means that you have no baseline on new metrics; and existing metrics may see double digit changes just because of the new experience or because the metric’s logic is no longer accurate—the challenge is in figuring out which is which.
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
Shirshanka Das is a principal staff software engineer and the architect for LinkedIn’s analytics platforms and applications team. He was among the original authors of a variety of open and closed source projects built at LinkedIn, including Databus, Espresso, and Apache Helix. He’s working with his team to simplify the big data analytics space at LinkedIn through a multitude of mostly open source projects, including Pinot, a high-performance distributed OLAP engine; Gobblin, a data lifecycle management platform for Hadoop; WhereHows, a data discovery and lineage platform; and Dali, a data virtualization layer for Hadoop.
Yael Garten is director of data science at LinkedIn, where she leads a team that focuses on understanding and increasing growth and engagement of LinkedIn’s 400 million members across mobile and desktop consumer products. Yael is an expert at converting data into actionable product and business insights that impact strategy. Her team partners with product, engineering, design, and marketing to optimize the LinkedIn user experience, creating powerful data-driven products to help LinkedIn’s members be productive and successful. Yael champions data quality at LinkedIn; she has devised organizational best practices for data quality and developed internal data tools to democratize data within the company. Yael also advises companies on informatics methodologies to transform high-throughput data into insights and is a frequent conference speaker. She holds a PhD in biomedical informatics from the Stanford University School of Medicine, where her research focused on information extraction via natural language processing to understand how human genetic variations impact drug response, and an MSc from the Weizmann Institute of Science in Israel.
Comments on this page are now closed.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.