Presented By O'Reilly and Cloudera
Make Data Work
September 26–27, 2016: Training
September 27–29, 2016: Tutorials & Conference
New York, NY

Creating real-time, data-centric applications with Impala and Kudu

Marcel Kornacker (Cloudera), Todd Lipcon (Cloudera)
2:05pm–2:45pm Wednesday, 09/28/2016
Hadoop use cases
Location: 3D 08 Level: Beginner
Average rating: ****.
(4.50, 8 ratings)

Prerequisite knowledge

  • A basic understanding of SQL and the Hadoop ecosystem
  • What you'll learn

  • Learn how Kudu + Impala can simplify your real-time data-centric applications
  • Description

    Running real-time data-intensive applications on Apache Hadoop requires complex architectures to store and query data, typically involving multiple independent systems that are tied together through custom-engineered pipelines. A common pattern is to use a NoSQL engine like Apache HBase for caching and later transformations, the results of which are periodically written to HDFS in one of the popular open columnar file formats as a prerequisite for querying by a SQL engine.

    Apache Kudu (incubating), a new scalable distributed storage engine designed for the Hadoop environment, gives the user low-latency single-row access as well as high-throughput bulk data scans. Integrated with Apache Impala (incubating), these capabilities are made available to the user via standard SQL language elements for updates and querying, combining the flexible update functionality of an RDBMS with the performance of a parallel analytic database system.

    Todd Lipcon and Marcel Kornacker explain how to simplify Hadoop-based data-centric applications with the CRUD (create, read, update, and delete) and interactive analytic functionality of Apache Impala (incubating) and Apache Kudu (incubating), offering an introduction to using Impala + Kudu to power your real-time data-centric applications for use cases like time series analysis (fraud detection, stream market data), machine data analytics, and online reporting.

    Photo of Marcel Kornacker

    Marcel Kornacker

    Cloudera

    Marcel Kornacker is a tech lead at Cloudera and the architect of Apache Impala (incubating). Marcel has held engineering jobs at a few database-related startup companies and at Google, where he worked on several ad-serving and storage infrastructure projects. His last engagement was as the tech lead for the distributed query engine component of Google’s F1 project. Marcel holds a PhD in databases from UC Berkeley.

    Photo of Todd Lipcon

    Todd Lipcon

    Cloudera

    Todd Lipcon is an engineer at Cloudera, where he primarily contributes to open source distributed systems in the Apache Hadoop ecosystem. Previously, he focused on Apache HBase, HDFS, and MapReduce, where he designed and implemented redundant metadata storage for the NameNode (QuorumJournalManager), ZooKeeper-based automatic failover, and numerous performance, durability, and stability improvements. In 2012, Todd founded the Apache Kudu project and has spent the last three years leading this team.¬†Todd is a committer and PMC member on Apache HBase, Hadoop, Thrift, and Kudu, as well as a member of the Apache Software Foundation. Prior to Cloudera, Todd worked on web infrastructure at several startups and researched novel machine learning methods for collaborative filtering. Todd holds a bachelor’s degree with honors from Brown University.

    Comments on this page are now closed.

    Comments

    09/29/2016 7:20am EDT

    are slides available? thanks

    Dino Vitale
    09/28/2016 6:24pm EDT

    will the deck be shared?