Kudu is redefining the big data ecosystem and opening doors to capabilities not previously available. Sridhar Alla and Shekhar Agrawal explain how Comcast has deployed the largest Kudu cluster thus far and is rapidly developing advanced applications to provide real-time analytics at petabyte scale while avoiding the expensive denormalization processes, covering how real-time analytics using Kudu scale much higher than using other NoSQL databases.
Sridhar and Shekhar release the practical implementation details and talk about the extensive benchmarks at 1 trillion-event table sizes. While the Spark platform processes both the historical data and the real-time events streaming through Kafka, the middle tier accesses Kudu tables to generate subsecond real-time dashboards while still having the power of Hadoop to deliver batch analytics and integrations with other platforms. This is key to the success of the platform—previously Comcast had to rely on variety of multitiered architectures to provide fast storage and still be able to update just like NoSQL engines—but without the lag caused by several thousand updates per second.
Sridhar Alla is cofounder and CTO at BlueWhale, which brings together the worlds of big data and artificial intelligence to provide comprehensive solutions to meet the business needs of organizations of all sizes. He and his team are cloud and tool agnostic and strive to embed themselves into the workstream to provide strategic and technical assistance. Sridhar is also an avid speaker, author, and coach. He lives in southern New Jersey with his wife and daughter.
Shekhar Agrawal is the director of data science at Comcast. Shekhar is an expert data scientist with specialization in the text and NLP fields. He currently handles several PB-scale modeling initiatives to improve customer experience factors.
Comments on this page are now closed.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.