How to Win Friends and Influence People (Using Hadoop)

Hadoop: Case Studies, Gramercy Suite (NY Hilton)
Average rating: *****
(5.00, 2 ratings)

Hadoop is gaining momentum with most companies as a means to do log analysis and business reporting. Hadoop is a great tool for solving these problems, but it can be used to build much more interesting data applications.

Hadoop is a general purpose, high performance data processing pipeline. At LinkedIn, the largest professional social network, we use Hadoop for several uncommon and interesting use cases. For instance, we look at marketing as a recommendation problem, not a sales problem. To do this, we use Hadoop for our recommendation, data processing, and content delivery pipelines, approaching marketing as a scientific process that helps us learn how to advertise better. To this end, we’ve developed a Hadoop-based system that generates and prioritizes marketing email messages. As another example, we use Hadoop to generate updates in a member’s news feed. This system can be used to deliver rich analytical insights to members or to quickly prototype an idea for a new update, all with a 1-line command that’s easy enough for even product managers to use. As one final example, we use Hadoop to power several recommendation systems, including People You May Know.

In this talk, we’ll describe how LinkedIn leverages Hadoop for these use cases. We’ll give detailed descriptions of the systems and tools that we have built to use Hadoop for production pipelines (such as Azkaban and Kafka), and interesting things we’ve learned along the way. We’ll talk about how Hadoop allows us to come up with ideas, rapidly test them, and how we can quickly turn these ideas into scalable production processes.

Photo of Sam Shah

Sam Shah


SkipFlag is turning conversations into knowledge (

Photo of Joseph Adler

Joseph Adler


Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an B.Sc. and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of “Baseball Hacks” and “R in a Nutshell”. Currently, he is a senior data scientist at LinkedIn.

Comments on this page are now closed.


Picture of Sam Shah
Sam Shah
10/29/2012 4:03pm EDT

Michael, the slides are available here:

michael semb wever
10/25/2012 2:46pm EDT

will the slides and video for this presentation be published? (it was an awesome presentation)


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners

Press and Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata contacts.