For information on exhibition and sponsorship opportunities at the convention, contact Sharon Cordesse at email@example.com
Download the OSCON Data Sponsor/Exhibitor Prospectus
For information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
To stay abreast of convention news and announcements, please sign up for the OSCON email bulletin (login required)
View a complete list of OSCON contacts
The Apache Hadoop Map-Reduce framework is showing it’s age, clearly.
In particular, the Map-Reduce JobTracker needs a drastic overhaul to address several technical deficiencies in its memory consumption, much better threading-model and scalability/reliability/performance given observed trends in cluster sizes and workloads. Periodically, we have done running repairs. However, lately these have come at an ever-growing cost as evinced by the worrying regular site-up issues we have seen in the past year. The architectural deficiencies, and corrective measures, are both old and well understood – even as far back as late 2007: https://issues.apache.org/jira/browse/MAPREDUCE-278.
The most pressing requirements for the next generation of the Map-Reduce framework are:
The fundamental idea of YARN is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
This talk will cover more of YARN design and architecture and how it improves Apache Hadoop to process data better via Hadoop Map-Reduce and allows for other programming paradigms on Hadoop grids.
Arun is the lead of the next generation MapReduce project in Apache Hadoop. Arun has been a full-time contributor to Apache Hadoop since its inception in 2006. He is a long-time committer and member of the Apache Hadoop PMC and jointly holds the current world sorting record using Apache Hadoop. Prior to co-founding Hortonworks, Arun was responsible for all MapReduce code and configuration deployed across the 42,000+ servers at Yahoo!. In essence, he was responsible for running Apache Hadoop’s MapReduce as a service for Yahoo!. Follow Arun on Twitter: @acmurthy.
He is directly responsible for every bit of code and configuration of Map-Reduce deployed at over 40,000 machines running Apache Hadoop at Yahoo. He jointly holds the world-record for sorting data using Hadoop Map-Reduce.