Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Creating real-time, data-centric applications with Impala and Kudu

Marcel Kornacker (Cloudera)
11:1511:55 Wednesday, 24 May 2017
Hadoop platform and applications
Location: Capital Suite 13
Level: Beginner
Average rating: ****.
(4.12, 8 ratings)

Who is this presentation for?

  • Architects, analysts, and developers

Prerequisite knowledge

  • A basic understanding of SQL and the Hadoop ecosystem

What you'll learn

  • Understand how Kudu and Impala can simplify your real-time data-centric applications


Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar file formats as a prerequisite for querying by a SQL engine.

Apache Kudu (incubating), a new scalable distributed storage engine designed for the Hadoop environment, gives the user low-latency single-row access as well as high-throughput bulk data scans. Integrated with Apache Impala (incubating), these capabilities are made available to the user via standard SQL language elements for updates and querying, combining the flexible update functionality of an RDBMS with the performance of a parallel analytic database system.

Marcel Kornacker explains how to simplify Hadoop-based data-centric applications with the CRUD (create, read, update, and delete) and interactive analytic functionality of Apache Impala (incubating) and Apache Kudu (incubating), offering an introduction to using Impala and Kudu to power your real-time data-centric applications for use cases like time series analysis (fraud detection, stream market data), machine data analytics, and online reporting.

Photo of Marcel Kornacker

Marcel Kornacker


Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.