Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He’s been involved in the open source community and has contributed to a large number of projects over the last decade.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at email@example.com.
For information on trade opportunities contact Kathy Yu at mediapartners
For media-related inquiries, contact Maureen Jennings at firstname.lastname@example.org
View a complete list of Strata contacts.