The Big Data Ecosystem at LinkedIn

Data: Big Data
Location: B118-119
Average rating: ****.
(4.11, 9 ratings)

The last few years have brought a wealth of new data technologies organized around horizontal scalability. LinkedIn has built out an ecosystem of infrastructure to support products that use data in innovative ways and create significant infrastructure demands. This talk will cover what the essential areas of technology are, and how LinkedIn has met the needs with a mixture of great apache projects like Hadoop, Zookeeper, Pig, and Avro as well as a set of open source projects of our own creation such as Voldemort, Kafka, and Azkaban.

Hadoop is the key ingredient for offline computation, but creating an agile system for offline computing requires a lot more than just a Hadoop cluster.

Stream-processing is an under-utilized model that enables real-time data processing. Kafka is LinkedIn’s open source framework that enables map/reduce like processing without the high-latency turnaround of Hadoop jobs.

Finally live serving and data deployment are the last mile of analytical data processing—getting terrabytes of data delivered and available for serving with low latency is what actually gets your data in front of your users.

The focus of this talk will be to tell the story of how we began to understand these problems, the pitfalls along the way, and how products on our site take advantage of this ecosystem.

Photo of Jay Kreps

Jay Kreps


Jay Kreps is the cofounder and CEO of Confluent, a company focused on Apache Kafka. Previously, Jay was one of the primary architects for LinkedIn, where he focused on data infrastructure and data-driven products. He was among the original authors of a number of open source projects in the scalable data systems space, including Voldemort (a key-value store), Azkaban, Kafka (a distributed messaging system), and Samza (a stream processing system).

Comments on this page are now closed.


Picture of Sheeri K. Cabral
Sheeri K. Cabral
09/04/2011 10:00pm PDT

A video for this talk can be found online at