Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Real-time analytics using Kudu at petabyte scale

Sridhar Alla (Blue Whale), Shekhar Agrawal (Comcast)
4:20pm5:00pm Wednesday, March 15, 2017
Real-time applications, Stream processing and analytics
Location: LL20 A Level: Intermediate
Secondary topics:  Architecture, Media, Platform
Average rating: *****
(5.00, 2 ratings)

Who is this presentation for?

  • Engineers, architects, and developers

Prerequisite knowledge

  • Familiarity with the Hadoop ecosystem
  • A basic knowledge of MapReduce and HDFS

What you'll learn

  • Learn the practical aspects of the Kudu storage system, using Spark to interact with Kudu to provide fast analytics on huge datasets

Description

Kudu is redefining the big data ecosystem and opening doors to capabilities not previously available. Sridhar Alla and Shekhar Agrawal explain how Comcast has deployed the largest Kudu cluster thus far and is rapidly developing advanced applications to provide real-time analytics at petabyte scale while avoiding the expensive denormalization processes, covering how real-time analytics using Kudu scale much higher than using other NoSQL databases.

Sridhar and Shekhar release the practical implementation details and talk about the extensive benchmarks at 1 trillion-event table sizes. While the Spark platform processes both the historical data and the real-time events streaming through Kafka, the middle tier accesses Kudu tables to generate subsecond real-time dashboards while still having the power of Hadoop to deliver batch analytics and integrations with other platforms. This is key to the success of the platform—previously Comcast had to rely on variety of multitiered architectures to provide fast storage and still be able to update just like NoSQL engines—but without the lag caused by several thousand updates per second.

Photo of Sridhar Alla

Sridhar Alla

Blue Whale

Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance. Sridhar is also an avid speaker, author, and coach. He lives in southern New Jersey with his wife and daughter.

Photo of Shekhar Agrawal

Shekhar Agrawal

Comcast

Shekhar Agrawal is the director of data science at Comcast. Shekhar is an expert data scientist with specialization in the text and NLP fields. He currently handles several PB-scale modeling initiatives to improve customer experience factors.

Comments on this page are now closed.

Comments

James Skinner | STRATEGIC ACCOUNT EXECUTIVE
03/16/2017 12:32am PDT

Hi – is there a video for this session that I can view? Thanks!