What happens if you take everything that is happening in your company—every click, every impression, every database change, every application log—and make it all available as a real-time stream of well structured data?
I will discuss the experience at LinkedIn and elsewhere moving from batch-oriented ETL to real-time streams. I’ll talk about how the design and implementation of Apache Kafka was driven by this goal of acting as a real-time platform for event data. I will cover some of the challenges of scaling Kafka to hundreds of billions of events per day and making data available to thousands of users, applications, and data systems in a self-service fashion.
I will describe how real-time streams can become the source of ETL into Hadoop or a relational data warehouse, and how real-time data can supplement the role of batch-oriented analytics in Hadoop or a traditional data warehouse.
I will also describe how applications and stream processing systems such as Storm or Samza can make use of these feeds for sophisticated real-time data processing as events occur.
Jay is one of the primary architects for LinkedIn where he focuses on data infrastructure and data-driven products.
He has spent equal time working on innovative data products such as predicting professional relationships (“People You May Know”), collaborative filtering, and other data-driven products.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email firstname.lastname@example.org
For information on trade opportunities with O'Reilly conferences, email email@example.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata + Hadoop World contacts
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.