LinkedIn processes enormous amounts of events each day. This data is of critical importance for data analysts, engineers, business experts, and data scientists that seek deep understanding of the interactions within LinkedIn’s professional social graph. They use this data to derive insights and performance metrics, which lead to better business decisions on products, marketing, sales, and other functional areas. Areas of interest include Email, Growth, Engagement, and Trending metrics. Development of internal tools has traditionally been based on specific need, optimized for the business use case, and non-interoperable. The engineering challenge is to allow business users to easily access and organize huge amounts of data in a comprehensive way and to be able to flexible and quickly get to the insights through graphs and charts that they need. The data needs to be sufficiently granular to work for different needs, the interface needs to be intuitive and simple, and the infrastructure needs to be high performance allowing users to manipulate large amounts of data quickly.
The solution to this challenge was realized by the LinkedIn Business Analytics and Data Analytics Infrastructure teams utilizing an integrated stack that includes an interactive analytics infrastructure and a self-serve data visualization frontend solution. The user interface provides a customizable ability to build charts, tables, and queries to suit highly customized reporting needs on any devices. The backend infrastructure is based on Hadoop; which leverages LinkedIn’s investment in high scalable, data rich systems. The combined solution brings the ability to visualize, slice, dice, and drill through billions of records and hundreds of dimensions at fast scale.
In this talk, you will learn the background of the data challenges that LinkedIn faced, how the teams came together to construct the solution, and the underlying stack structure powering this solution.
Praveen Neppalli Naga leads Linkedin’s Distributed Data Aanalytics team and is responsible for building an distributed infrastructure for all interactive analytics needs at Linkedin. The infrastructure supports both Linkedin’s member/customer facing analytics and internal analytics purposes.
Chi-Yi Kuan is director of data science at LinkedIn. He has over 15 years of extensive experience in applying big data analytics, business intelligence, risk and fraud management, data science, and marketing mix modeling across various business domains (social network, ecommerce, SaaS, and consulting) at both Fortune 500 firms and startups. Chi-Yi is dedicated to helping organizations become more data driven and profitable. He combines deep expertise in analytics and data science with business acumen and dynamic technology leadership.
Jonathan leads LinkedIn’s Business Analytics Solution team focused on technical solutions. As the team’s technical leader, Jonathan provides end-to-end big data analytics solutions including data integration, data processing and data visualization. His team delivers easy, fast and scalable data-driven solutions for LinkedIn’s Product monetization, Global Sales, Marketing and Operations teams to analyze data and make decisions. Before LinkedIn, Jonathan had over 10 years experience in data warehouse and business intelligence area and worked for eBay, HP and Baosight.