Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Building a scalable streaming ingestion application with exactly once semantics using Apache Apex

Pramod Immaneni (DataTorrent)

Who is this presentation for?

Engineers, Architects

Prerequisite knowledge

Having knowledge of basic Hadoop architecture and ecosystem will be useful. Having a software background or programming experience will help the attendee get the most out of the talk.

What you'll learn

The audience will understand the complexities of ingesting and processing streaming data at scale. They will learn about Apache Apex platform and how to build streaming applications to address their streaming needs. They will also get to see how Apex is being used by others in the industry to solve similar problems.


The talk will start with an overview of the architecture of Apache Apex, a big data streaming analytics platform. Apex comes with a powerful stream processing engine that has built-in scalability and fault-tolerance, a rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on Hadoop YARN and HDFS and is used in production in various industries.

The talk will then go into building a streaming application on Apex using ingestion ETL as an example. The application would stream data from a real-time messaging system like Kafka, process data and store the results into an external data store. During this building process, features such as scalability, fault tolerance and end-to-end exactly once will be introduced and also how they are achieved for the application will be shown.

The talk will end with showcasing some use cases where Apex is being used in production today to solve similar ingestion related problems.

Photo of Pramod Immaneni

Pramod Immaneni


Pramod Immaneni is a PMC member of Apache Apex and lead architect at DataTorrent Inc, where he works on the Apex platform and specializes in big data applications. Prior to DataTorrent he was a founder of technology startups. He was CTO of Leaf Networks, a company he co-founded and was later acquired by Netgear Inc. He built products in the core networking space and holds patents in peer-to-peer VPNs. Before that he was involved in starting a company where he architected a dynamic content customization engine for mobile devices.