Twitter users generate terabytes of data every day. With components like the streaming API, our dataset is increasingly available to developers, so our big data challenge is becoming your big data challenge as well.
In this talk we’ll discuss tools like Hadoop and Pig that let you do large-scale data crunching in parallel. We will focus on platform-specific examples, as well as best practices gleaned from working with Twitter data over the last 12 months.
Kevin Weil specializes in the technology behind distributed systems, parallel processing, and analytics, especially in the context of large datasets. He currently leads the analytics team at Twitter, using Hadoop and other big data analytics tools to crunch Twitter’s massive data set and apply those learnings to improve the product. Prior to Twitter, he was the first employee at next-generation web media startup Cooliris, backed and incubated by Kleiner Perkins. At Cooliris, he innovated on user growth and advertising-focused analytics on a server cluster running Hadoop, Pig, and Hive — open-source implementations of the Google technology stack that are central to companies like Facebook, Yahoo, and A9. Mr. Weil has also worked at municipal wireless network provider Tropos Networks, where he optimized the performance of citywide wireless mesh networks. He has also worked at Microsoft Research and at SLAC.