Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Yarns about YARN: Migrating to MapReduce v2

Kathleen Ting (Cloudera), Miklos Christine (Databricks)
4:00pm–4:40pm Thursday, 02/19/2015
Hadoop Platform
Location: 210 B/F
Average rating: ***..
(3.50, 2 ratings)
Slides:   1-PDF 

The job throughput and Apache Hadoop cluster utilization benefits of YARN and MapReduce v2 are widely known. Who wouldn’t want job throughput increased by 2x? Most likely you’ve heard (repeatedly) about the key benefits that could be gained from migrating your Hadoop cluster from MapReduce v1 to YARN: namely around improved job throughput and cluster utilization, as well as around permitting different computational frameworks to run on Hadoop. What you probably haven’t heard about are the configuration tweaks needed to ensure your existing MR v1 jobs can run on your YARN cluster as well as YARN specific configuration settings. In this session we’ll start with a list of recommended YARN configurations, and then step through the most common use-cases we’ve seen in the field. Production migrations can quickly go awry without proper guidance. Learn from others’ misconfigurations to get your YARN cluster configured right the first time.

Photo of Kathleen Ting

Kathleen Ting

Cloudera

Kathleen Ting (@kate_ting) is currently a technical account manager at Cloudera where she helps strategic customers deploy and use the Apache Hadoop ecosystem in production. She’s a frequent conference speaker, has contributed to several projects in the open source community, and is a committer and PMC member on Apache Sqoop. Kathleen is also a co-author of O’Reilly’s Apache Sqoop Cookbook.

Photo of Miklos Christine

Miklos Christine

Databricks

Miklos Christine is a solutions engineer for Databricks. Miklos was previously a system engineer at Cloudera where he helped strategic customers deploy and use the Apache Hadoop ecosystem in production. He has contributed to several projects in the open source community, previously worked on the design and implementation of the system infrastructure for the OS that runs on Cisco’s routers and switches, and holds a BS in electrical engineering and computer sciences from the University of California-Berkeley.