Building a Cloud Culture at Yelp

Jim Blomo (Yelp)
Location: F150 Level: Novice
Average rating: ***..
(3.83, 6 ratings)

This talk highlights the areas, both cultural and technological, where
Yelp has changed to best take advantage of new cloud products. The
themes I’ll cover are:

  • History: Yelp’s history as 100% hosted, with Hadoop experiments on
    spare machines
  • Problems with hosted model: Unreliable data processing, and
    extensive coordination around feature launches
  • Usage Today: Mixed usage, including 7+ TB hosted databases, 250+ GB
    compressed logs /day in S3, dozens of EMR jobs per day
  • Company Progress: 40 → 80 Million monthly visitors with 100s of new
    features across several mediums (website, native mobile apps, mobile
  • How did we get here? Big wins with EMR using open sourced libraries;
    policies around development, privacy, and testing
  • Yelp Features Supported by EMR: Search Relevance, Usage graphs,
    Review Highlights, Spam Filtering, Advertising Optimizations
  • Open source tools:, mrjob, EMRio, Tron, s3mysqldump
  • Lessons: Hardest part was not technology adoption, but integration
    into existing workflows and policies. Shared understanding of
    resources available is critical.
Photo of Jim Blomo

Jim Blomo


Jim Blomo (@jimblomo) is passionate about putting data to work by developing robust, elegant systems. At Yelp, he manages a growing data mining team that uses Hadoop, mrjob, and oddjob to process TBs of data. Before Yelp, he built infrastructure for startups and Amazon.

Jim also lectures at UC Berkeley’s School of information on Data Mining and Web Architecture and has presented at conferences such as AWS re:Invent and Wolfram Alpha Data Summit.


Sponsorship Opportunities

For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at (707) 827-7065 or

Contact Us

View a complete list of OSCON contacts