Hadoop is responsible for computing a varying array of data
products at LinkedIn, including People You May Know (LinkedIn’s
people recommendation service), People Who Viewed This Also
Viewed (LinkedIn’s collaborative filtering), Who’s Viewed My
Profile?, Career Center, LinkedIn’s job recommendations, and
more. These products are immensely successful and extremely data
intensive: People You May Know, for example, generates a
significant portion of the invitations on LinkedIn, churning
through over 50 TB of data every day.
In this talk, I will detail the pieces of infrastructure that
allow us to make this happen (all open sourced), which will allow
an attendee to build their own data products. I will also give
tips & tricks that we have learned, sometimes painfully, along
the way. This talk is geared towards the intermediate Hadoop user
who perhaps has a few jobs that compute some data, but wants to
learn how to put this into a productionized process. There will
also be some nuggets for advanced users on how LinkedIn deals
with big data.
The talk will be subdivided into 4 “proverbs,” as follows.
Sam Shah is a Senior Software Engineer in the
Search, Network, and Analytics Team at LinkedIn,
working on applied data products. He is
particularly involved in the relevance backends
behind “People You May Know,” LinkedIn’s people
recommendation service, and LinkedIn’s
collaborative filtering system. He holds a Ph.D.
from the University of Michigan.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities at the conference, contact Susan Young at firstname.lastname@example.org
Download the Strata Sponsor/Exhibitor Prospectus
View a complete list of Strata Contacts