Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

The future of ETL isn’t what it used to be

Gwen Shapira (Confluent)
11:00am11:40am Wednesday, March 7, 2018
Secondary topics:  Data Integration and Data Pipelines
Average rating: ****.
(4.93, 14 ratings)

Who is this presentation for?

  • Data engineers

Prerequisite knowledge

  • Familiarity with Apache Kafka and data engineering

What you'll learn

  • Learn how to use Apache Kafka to build microservices-based data pipelines and use schemas to safely evolve data pipelines
  • Understand how and why you should enrich events by joining streams

Description

Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want
the way you want it and because this problem remains challenging despite 40 years of attempts to solve it. Software engineering practices constantly evolve, but in many organizations, data engineering teams still party like its 1999.

Gwen Shapira shares design and architecture patterns that are used to modernize data engineering and details how Apache Kafka, microservices, and event streams are used by modern engineering organizations to efficiently build data pipelines that are scalable, reliable, and built to evolve. Gwen starts with a discussion of how software engineering has changed over the last 20 years, focusing on microservices, stream processing, the cloud, and the proliferation of data stores. She then presents three core patterns of modern data engineering:

  • Building data pipelines from decoupled microservices
  • Agile evolution of these pipelines using schemas as a contract for microservices
  • Enriching data by joining streams of events

Gwen gives examples of how organizations use these patterns to move faster, not break things, and scale their data pipelines and demonstrates how to implement them with Apache Kafka.

Photo of Gwen Shapira

Gwen Shapira

Confluent

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data-processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.