Yarns about YARN: Migrating to MapReduce v2

Kathleen Ting (Cloudera)
Hadoop & Beyond
Location: 120-121
Average rating: ****.
(4.50, 4 ratings)
Slides:   external link

The job throughput and Apache Hadoop cluster utilization benefits of YARN and MapReduce v2 are widely known. Who wouldn’t want job throughput increased by 2x? Most likely you’ve heard (repeatedly) about the key benefits that could be gained from migrating your Hadoop cluster from MapReduce v1 to YARN: namely around improved job throughput and cluster utilization, as well as around permitting different computational frameworks to run on Hadoop. What you probably haven’t heard about are the configuration tweaks needed to ensure your existing MR v1 jobs can run on your YARN cluster as well as YARN specific configuration settings. In this session we’ll start with a list of recommended YARN configurations, and then step through the most common use-cases we’ve seen in the field. Production migrations can quickly go awry without proper guidance. Learn from others’ misconfigurations to get your YARN cluster configured right the first time.

Photo of Kathleen Ting

Kathleen Ting

Cloudera

Kathleen Ting (@kate_ting) is currently a technical account manager at Cloudera where she helps strategic customers deploy and use the Apache Hadoop ecosystem in production. She’s a frequent conference speaker, has contributed to several projects in the open source community, and is a committer and PMC member on Apache Sqoop. Kathleen is also a co-author of O’Reilly’s Apache Sqoop Cookbook.