Skip to main content

Hadoop From Batch Data Processing to Real-time Query and Streaming Platform

Peter Sirota (Amazon Web Services)
Ballroom H
Average rating: ***..
(3.00, 4 ratings)

By pairing the elasticity and pay-as-you-go nature of the cloud with the flexibility and scalability of Hadoop, Amazon Elastic MapReduce has made Hadoop even more accessible to companies looking to maximize the value of their data. Each day, tens of thousands of Hadoop clusters are run on the Amazon Elastic MapReduce infrastructure by users of every size — from university students to Fortune 50 companies.

Originally, Hadoop was used as a batch analytics tool; however, this is rapidly changing as applications move towards real-time processing and streaming. In this session, you will learn from the Amazon Elastic MapReduce team’s recent experience with streaming services such as Amazon Kinesis and low-latency query engines like Impala and Phoenix. We’ll clarify many of the implementation details of our Hadoop InputFormat for Amazon Kinesis and demonstrate the power and flexibility of applying existing Hadoop ecosystem technologies to the real-time data paradigm. Additionally, this presentation will showcase features and frameworks for continuous processing in Hadoop.

This session is sponsored by Amazon

Photo of Peter  Sirota

Peter Sirota

General Manager, Amazon Elastic MapReduce, Amazon Web Services

Peter Sirota is the General Manager of Amazon Elastic MapReduce, a managed Hadoop web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. Before starting Amazon Elastic MapReduce, Peter led AWS Platform teams responsible for billing, authentication, portal, and Amazon DevPay services. Peter holds a bachelor’s degree in computer science from Northeastern University.