Presented By O'Reilly and Cloudera
Make Data Work
Sept 29–Oct 1, 2015 • New York, NY

How Spark is working out at Comcast scale

Sridhar Alla (Comcast), Jan Neumann (Comcast)
4:35pm–5:15pm Thursday, 10/01/2015
Spark & Beyond
Location: 1 E20 / 1 E21 Level: Intermediate
Tags: media
Average rating: ***..
(3.67, 12 ratings)

Spark is redefining the big Data ecosystem and opening doors to capabilities not available before. Comcast is moving in the direction of adopting Spark for several projects, ranging from real-time processing, data science, and large scale analytics, to the Xfinity personalization platform.

This talk will showcase some of the use cases and how we use Spark to solve many of the tough problems at Comcast scale.

1. Real time processing is a challenging topic, and historically we had to use a mixture of home-grown and off-the-shelf technologies to get a working model. We will talk about how we use Kafka to get events from over 40 million STBs, and hand over to Spark streaming which processes the events.

2. Data science is a very challenging field and more requirements pop up all the time. We are looking at Spark to build a reliable flexible framework that integrates with Hadoop, HBase, and Solr as well as provide ease of migration for data scientists who work in R and SAS.

3. Comcast provides personalized recommendations to its customers on the X1 platform. Our initial implementation was built on the Hadoop map-reduce framework using a batch computation model. When we wanted to explore how we can offer real-time recommendations, we looked to Spark because of its increased computational efficiency, and its ease in developing both streaming and batch processing solutions using the same code base.

In short, there are many fields that can profit from the plethora of capabilities provided by Spark.

Photo of Sridhar Alla

Sridhar Alla

Comcast

Sridhar Alla is the director of big data solutions and architecture at Comcast, where he has delivered several key solutions, such as the Xfinity personalization platform, clickthru analytics, and the correlation platform. Sridhar started his career in network appliances on NAS and caching technologies. Previously, he served as the CTO of security company eIQNetworks, where he merged the concepts of big data and security products. He holds patents on the topics of very large-scale processing algorithms and caching.

Photo of Jan Neumann

Jan Neumann

Comcast

Jan Neumann manages the research group at Comcast Labs DC where he and his team focus on using machine learning and large scale computing for content discovery, multimedia information extraction, and big data analysis with the goal to innovate the TV and home consumer experience. Before Comcast, he worked for Siemens Corporate Research on various computer vision related projects. He holds a Ph.D. in Computer Science from the University of Maryland, College Park.