Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Nielsen presents: Fun with Kafka, Spark, and offset management

Simona Meriam (Nielsen)
14:0514:45 Wednesday, 1 May 2019
Data Engineering and Architecture, Expo Hall
Location: Expo Hall 2 (Capital Hall N24)
Average rating: ****.
(4.57, 7 ratings)

Who is this presentation for?

  • Big data engineers, developers, and solution architects

Level

Intermediate

Prerequisite knowledge

  • Familiarity with Spark and Spark Streaming, Kafka, RDBMS, and SQL

What you'll learn

  • Understand the different ways of committing and retrieving your consumer offsets and the dos and don'ts of designing an offset management solution
  • Learn how to manage your Spark-Kafka consumer offsets in a relational database

Description

NMC provides customers (both marketers and publishers) with real-time analytics tools to profile their target audiences. To achieve that, the company needs to ingest billions of events per day into its big data stores in a scalable, cost-efficient, and consistent way—no loss or duplication.

When working with Spark and Kafka, the way to achieve data consistency is to manage your consumer offsets the right way.

Simona Meriam explains how NMC used to manage its Kafka consumer offsets against Spark-Kafka 0.8 consumer and why the company decided to upgrade from Spark-Kafka 0.8 to 0.10 consumer. Simona reviews the problems encountered during the upgrade and details the process that led to the solution.

Photo of Simona Meriam

Simona Meriam

Nielsen

Simona Meriam is a big data engineer at Nielsen Marketing Cloud, where she specializes in research and development of solutions for big data infrastructures using cutting-edge technologies such as Spark, Kafka, and Elasticsearch.