Spark is redefining the big Data ecosystem and opening doors to capabilities not available before. Comcast is moving in the direction of adopting Spark for several projects, ranging from real-time processing, data science, and large scale analytics, to the Xfinity personalization platform.
This talk will showcase some of the use cases and how we use Spark to solve many of the tough problems at Comcast scale.
1. Real time processing is a challenging topic, and historically we had to use a mixture of home-grown and off-the-shelf technologies to get a working model. We will talk about how we use Kafka to get events from over 40 million STBs, and hand over to Spark streaming which processes the events.
2. Data science is a very challenging field and more requirements pop up all the time. We are looking at Spark to build a reliable flexible framework that integrates with Hadoop, HBase, and Solr as well as provide ease of migration for data scientists who work in R and SAS.
3. Comcast provides personalized recommendations to its customers on the X1 platform. Our initial implementation was built on the Hadoop map-reduce framework using a batch computation model. When we wanted to explore how we can offer real-time recommendations, we looked to Spark because of its increased computational efficiency, and its ease in developing both streaming and batch processing solutions using the same code base.
In short, there are many fields that can profit from the plethora of capabilities provided by Spark.
Sridhar Alla is director of data science and engineering at Comcast. A big data expert, over his career, Sridhar has helped companies large and small solve complex problems such as data warehousing, governance, security, real-time processing, high-frequency trading, and establishing large-scale data science practices. Previously, he was the chief technology officer at cybersecurity firm eIQNetworks and a storage software engineer at Network Appliance. Sridhar is a certified Agile DevOps practitioner and implementer. He is an avid presenter at conferences including Strata + Hadoop World and Spark Summit. Sridhar also provides onsite and online training for several technologies. He has several patents filed with the US PTO on large-scale computing and distributed systems. Sridhar holds a bachelor’s degree in computer science from JNTU in Hyderabad, India. He lives with his wife in New Jersey.
Jan Neumann leads Comcast’s Applied Artificial Intelligence Research Group, which combines large-scale machine learning, deep learning, NLP, and computer vision to develop novel algorithms and product concepts such as voice interfaces, virtual assistants, and video and IoT analytics that improve the experience of Comcast’s customers. Previously, Jan worked for Siemens Corporate Research on various computer vision-related projects, such as driver assistance systems and video surveillance. He has published over 20 papers in scientific conferences and journals and is a frequent speaker on machine learning and data science. He holds a PhD in computer science from the University of Maryland, College Park.
©2015, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.