For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at firstname.lastname@example.org
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at email@example.com
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
Amazon’s Elastic Map Reduce APIs provide a rich interface for the execution of Hadoop jobs on top of AWS’s S3 and EC2 infrastructure. In addition to the fault tolerance and scalability of Hadoop, EMR brings with it the ability to quickly create, use, and shut down independent Hadoop clusters made up of EC2 instances.
This talk discusses how this unique Hadoop environment has helped Etsy quickly build data-driven products such as the gift recommender, suggested shops, and the taste test. We’ll start with a cost-based analysis of the benefits of being able to create custom-fit, short-lived Hadoop clusters for specific jobs. We will discuss our in-house toolchain called Barnum and Bailey, which allows us to easily create, deploy, schedule, and monitor these ad-hoc clusters on EMR. Finally, we’ll explain the benefits this approach brings to the test-debug cycle for creating and maintaining jobs.
Greg Fodor is currently a engineer on Etsy’s “data wranglers” team, responsible for building products around ‘big data’ at Etsy.
Comments on this page are now closed.