Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

The future of ETL isn’t what it used to be.

Gwen Shapira (Confluent)
11:20am–12:00pm Wednesday, 09/12/2018
Data engineering and architecture
Location: 1A 23/24 Level: Intermediate
Secondary topics:  Data Integration and Data Pipelines
Average rating: ****.
(4.00, 4 ratings)

Who is this presentation for?

  • Data engineers and architects

Prerequisite knowledge

  • Familiarity with ETL concepts and Apache Kafka (useful but not required)

What you'll learn

  • Understand the current challenges of data integration, industry trends that data architects need to take into account when designing modern data pipelines, and design patterns that are proven to work well in modern architectures


Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. Software engineering practices have constantly evolved, but in many organizations data engineering teams still party like its 1999.

Gwen Shapira shares design and architecture patterns that are used to modernize data engineering. You’ll learn how modern engineering organizations use Apache Kafka, microservices, and event streams to efficiently build data pipelines that are scalable, reliable, and built to evolve.

Gwen begins with a discussion of how software engineering has changed in the last 20 years, focusing on microservices, stream processing, the cloud, and the proliferation of data stores. These changes represent both a challenge and opportunity for data engineers. Gwen then outlines three core patterns of modern data engineering: building data pipelines from decoupled microservices, the Agile evolution of these pipelines using schemas as a contract for microservices, and enriching data by joining streams of events. She walks you through examples of how organizations are using these patterns to move faster, not break things, and scale their data pipelines and demonstrates how to implement them with Apache Kafka.

Photo of Gwen Shapira

Gwen Shapira


Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Comments on this page are now closed.


09/12/2018 7:29am EDT

Can you please ahre the slides?