The talk will start with an overview of the architecture of Apache Apex, a big data streaming analytics platform. Apex comes with a powerful stream processing engine that has built-in scalability and fault-tolerance, a rich set of functional building blocks and an easy to use API for the developer to build real-time and batch applications. Apex runs natively on Hadoop YARN and HDFS and is used in production in various industries.
The talk will then go into building a streaming application on Apex using ingestion ETL as an example. The application would stream data from a real-time messaging system like Kafka, process data and store the results into an external data store. During this building process, features such as scalability, fault tolerance and end-to-end exactly once will be introduced and also how they are achieved for the application will be shown.
The talk will end with showcasing some use cases where Apex is being used in production today to solve similar ingestion related problems.
Pramod Immaneni is a PMC member of Apache Apex and lead architect at DataTorrent Inc, where he works on the Apex platform and specializes in big data applications. Prior to DataTorrent he was a founder of technology startups. He was CTO of Leaf Networks, a company he co-founded and was later acquired by Netgear Inc. He built products in the core networking space and holds patents in peer-to-peer VPNs. Before that he was involved in starting a company where he architected a dynamic content customization engine for mobile devices.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org