Scala + Cascading = Scalding

Hadoop & Beyond Hadoop: Tools & Technology, Gramercy West (NY Hilton)
Average rating: *****
(5.00, 3 ratings)

Start on low heat with a base of Hadoop; map, then reduce. Flavor, to taste, with Scala’s concise, functional syntax and collections library. Simmer with some Pig bones: a tuple model and high-level join and aggregation operators. Mix in Cascading to hold everything together and boil until it’s very, very hot, and you get Scalding, a new API for MapReduce out of Twitter.

Scalding is an open source Scala framework for concisely describing Hadoop MapReduce jobs. I started the project at Twitter as a way for ad server engineers to run simple queries on the ad logs, without needing to learn a specialized language like Pig, or dive too deeply into the guts of Hadoop. Since then, both the team and the framework have evolved, and it’s now used by 20 or so full-time data scientists, for all of their work: ads targeting, market insight, click-prediction, quality analysis, experiment analysis, and more. In this talk, I’ll walk you from its beginnings, expressing simple jobs like word count in a few lines of code, to its state-of-the-art present: word count is still a few lines of code, but so, for example, is PageRank.

Photo of Avi Bryant

Avi Bryant

Stripe

Avi Bryant founded the company behind Dabble DB, where he also hacked on Seaside, MagLev, and other tasty Smalltalk treats. After that company’s acquisition by Twitter, he spent a while building data tools and products for their ads team. He’s now an engineering manager at Etsy.

Sponsors

Sponsorship Opportunities

For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.

Media Partner Opportunities

For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com

Press and Media

For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com

Contact Us

View a complete list of Strata contacts.