Historically, use cases such as time series and mutable-profile datasets have been possible but difficult to achieve efficiently using traditional HDFS storage engines. These solutions might involve complex ingestion paths, deep understanding of file types, and compaction strategies. With the introduction of Kudu, many of these difficulties are eliminated. At the same time, interest in streaming solutions and low-latency analytics has surged with the growing popularity of tools like Apache Kafka.
Ted Malaska and Jeff Holoman explain how to go from zero to full-on time series and mutable profile systems in 40 minutes. Ted and Jeff cover code examples of ingestion from Kafka and Spark Streaming and access through SQL, Spark, and Spark SQL to explore the underlying theories and design patterns that will be common for most solutions with Kudu.
Ted Malaska is a senior solution architect at Blizzard. Previously, he was a principal solutions architect at Cloudera. Ted has 18 years of professional experience working for startups, the US government, some of the world’s largest banks, commercial firms, bio firms, retail firms, hardware appliance firms, and the largest nonprofit financial regulator in the US and has worked on close to one hundred clusters for over two dozen clients with over hundreds of use cases. He has architecture experience across topics including Hadoop, Web 2.0, mobile, SOA (ESB, BPM), and big data. Ted is a regular contributor to the Hadoop, HBase, and Spark projects, a regular committer to Flume, Avro, Pig, and YARN, and the coauthor of Hadoop Application Architectures.
Jeff Holoman is a systems engineer at Cloudera. Jeff is a Kafka contributor and has focused on helping customers with large-scale Hadoop deployments, primarily in financial services. Prior to his time at Cloudera, Jeff worked as an application developer, system administrator, and Oracle technology specialist.
©2016, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.