Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Fast data made easy with Apache Kafka and Apache Kudu (incubating)

Ted Malaska (Capital One), Jeff Holoman (Cloudera)
11:00am–11:40am Thursday, 03/31/2016
IoT and Real-time

Location: 210 C/G
Tags: real-time
Average rating: ****.
(4.50, 10 ratings)

Prerequisite knowledge

Attendees should have a basic knowledge of distributed systems.


Historically, use cases such as time series and mutable-profile datasets have been possible but difficult to achieve efficiently using traditional HDFS storage engines. These solutions might involve complex ingestion paths, deep understanding of file types, and compaction strategies. With the introduction of Kudu, many of these difficulties are eliminated. At the same time, interest in streaming solutions and low-latency analytics has surged with the growing popularity of tools like Apache Kafka.

Ted Malaska and Jeff Holoman explain how to go from zero to full-on time series and mutable profile systems in 40 minutes. Ted and Jeff cover code examples of ingestion from Kafka and Spark Streaming and access through SQL, Spark, and Spark SQL to explore the underlying theories and design patterns that will be common for most solutions with Kudu.

Photo of Ted Malaska

Ted Malaska

Capital One

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

Photo of Jeff Holoman

Jeff Holoman


Jeff Holoman is a systems engineer at Cloudera. Jeff is a Kafka contributor and has focused on helping customers with large-scale Hadoop deployments, primarily in financial services. Prior to his time at Cloudera, Jeff worked as an application developer, system administrator, and Oracle technology specialist.