Massive growth in the size of business datasets leads many companies to Hadoop, an emerging architecture for parallel data processing (and top-level Apache project). However, the migration path can be challenging, in part because MapReduce analyses use programming languages like Java and Python rather than SQL. Apache Pig is a high-level framework built on top of Hadoop that offers a powerful yet vastly simplified way to analyze data in Hadoop. It allows businesses to leverage the power of Hadoop in a simple language readily learnable by anyone that understands SQL. In this presentation, I will introduce Pig and show how it’s been used at Twitter to solve numerous analytics challenges that became intractable with our former MySQL-based architecture.
Kevin Weil leads the analytics team at Twitter, building distributed infrastructure and leveraging data analysis at a massive scale to help grow the popular micro-blogging service. With millions of monthly site visitors and many more interacting through API-based third party applications, Twitter has one of the world’s most varied and interesting datasets. Prior to joining Twitter, Kevin led the analytics team at the Kleiner Perkins-backed web media startup Cooliris. Kevin earned his bachelor’s degree in Mathematics and Physics from Harvard University, and has a master’s degree in Physics from Stanford University.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com
Download the OSCON Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the OSCON Newsletter (login required)
Have an idea for OSCON to share? oscon-idea@oreilly.com
View a complete list of OSCON contacts