Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Stream, stream, stream: Different streaming methods with Spark and Kafka

Itai Yaffe (Nielsen)
11:1511:55 Wednesday, 1 May 2019
Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)
Average rating: ****.
(4.45, 11 ratings)

Who is this presentation for?

  • Architects, data engineers, and big data developers

Level

Advanced

Prerequisite knowledge

  • A basic understanding of Kafka and Spark Streaming

What you'll learn

  • Explore various options for using Kafka and Spark to ingest billions of events per day
  • Learn how to scale your data infrastructure, based on those tools, in a cost-efficient manner

Description

NMC (Nielsen Marketing Cloud) provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient way.

Itai Yaffe explains how NMC continuously transforms its data infrastructure to support these goals. Itai details how the company went from CSV files and standalone Java applications to multiple Kafka and Spark clusters, performing a mixture of streaming and batch ETLs, and supporting 10x data growth. Join in to hear the company’s experience as an early adopters of Spark Streaming and Spark Structured Streaming and how it overcame the technical barriers the company faced (and there were plenty).

Itai concludes by sharing a rather unique solution of using Kafka to imitate streaming over NMC’s data lake while significantly reducing cloud services costs.

Topics include:

  • Kafka and Spark Streaming for stateless and stateful use cases
  • Spark Structured Streaming as a possible alternative
  • Combining Spark Streaming with batch ETLs
  • “Streaming” over a data lake using Kafka
Photo of Itai Yaffe

Itai Yaffe

Nielsen

Itai Yaffe is a big data tech lead at Nielsen Identity Engine, where he deals with big data challenges using tools like Spark, Druid, Kafka, and others. He’s also a part of the Israeli chapter’s core team of Women in Big Data. Itai is keen about sharing his knowledge and has presented his real-life experience in various forums in the past.