Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

The future of ETL isn’t what it used to be

Gwen Shapira (Confluent)

11:00am–11:40am Wednesday, March 7, 2018

Data engineering and architecture
Location: 212 A-B

Secondary topics: Data Integration and Data Pipelines

Average rating:

(4.93, 14 ratings)

Who is this presentation for?

Data engineers

Prerequisite knowledge

Familiarity with Apache Kafka and data engineering

What you'll learn

Learn how to use Apache Kafka to build microservices-based data pipelines and use schemas to safely evolve data pipelines
Understand how and why you should enrich events by joining streams

Description

Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want
the way you want it and because this problem remains challenging despite 40 years of attempts to solve it. Software engineering practices constantly evolve, but in many organizations, data engineering teams still party like its 1999.

Gwen Shapira shares design and architecture patterns that are used to modernize data engineering and details how Apache Kafka, microservices, and event streams are used by modern engineering organizations to efficiently build data pipelines that are scalable, reliable, and built to evolve. Gwen starts with a discussion of how software engineering has changed over the last 20 years, focusing on microservices, stream processing, the cloud, and the proliferation of data stores. She then presents three core patterns of modern data engineering:

Building data pipelines from decoupled microservices
Agile evolution of these pipelines using schemas as a contract for microservices
Enriching data by joining streams of events

Gwen gives examples of how organizations use these patterns to move faster, not break things, and scale their data pipelines and demonstrates how to implement them with Apache Kafka.

Gwen Shapira

Confluent

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Website

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com