Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Eric Sammer is currently a Principal Solution Architect at Cloudera where he helps customers plan, deploy, develop for, and use Hadoop and the related projects at scale. His background is in the development and operations of distributed, highly concurrent, data ingest and processing systems. He’s been involved in the open source community and has contributed to a large number of projects over the last decade.
Comments on this page are now closed.
For information on exhibition and sponsorship opportunities, contact Susan Stewart at sstewart@oreilly.com.
For information on trade opportunities contact Kathy Yu at mediapartners
@oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
View a complete list of Strata contacts.
Comments
The slides for this talk are now available on SlideShare, and can be accessed via the “external link” above.
Hi Bharath,
we’ll be posting the slides if the speaker provides them to us.
Thank you, Sophia
Will someone be posting the slides for this presentation?