Presented By O’Reilly and Cloudera

San Francisco • London • New York

Make Data Work

September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

The future of ETL isn’t what it used to be.

Gwen Shapira (Confluent)

11:20am–12:00pm Wednesday, 09/12/2018

Data engineering and architecture
Location: 1A 23/24 Level: Intermediate

Secondary topics: Data Integration and Data Pipelines

Average rating:

(4.00, 4 ratings)

Who is this presentation for?

Data engineers and architects

Prerequisite knowledge

Familiarity with ETL concepts and Apache Kafka (useful but not required)

What you'll learn

Understand the current challenges of data integration, industry trends that data architects need to take into account when designing modern data pipelines, and design patterns that are proven to work well in modern architectures

Description

Data integration is a difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. Software engineering practices have constantly evolved, but in many organizations data engineering teams still party like its 1999.

Gwen Shapira shares design and architecture patterns that are used to modernize data engineering. You’ll learn how modern engineering organizations use Apache Kafka, microservices, and event streams to efficiently build data pipelines that are scalable, reliable, and built to evolve.

Gwen begins with a discussion of how software engineering has changed in the last 20 years, focusing on microservices, stream processing, the cloud, and the proliferation of data stores. These changes represent both a challenge and opportunity for data engineers. Gwen then outlines three core patterns of modern data engineering: building data pipelines from decoupled microservices, the Agile evolution of these pipelines using schemas as a contract for microservices, and enriching data by joining streams of events. She walks you through examples of how organizations are using these patterns to move faster, not break things, and scale their data pipelines and demonstrates how to implement them with Apache Kafka.

Gwen Shapira

Confluent

Gwen Shapira is a system architect at Confluent, where she helps customers achieve success with their Apache Kafka implementations. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an Oracle Ace Director, the coauthor of Hadoop Application Architectures, and a frequent presenter at industry conferences. She is also a committer on Apache Kafka and Apache Sqoop. When Gwen isn’t coding or building data pipelines, you can find her pedaling her bike, exploring the roads and trails of California and beyond.

Website

Comments on this page are now closed.

Comments

Sergio Prado | BUSINES DEVLOPER MANAGER

09/12/2018 7:29am EDT

Can you please ahre the slides?

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsors

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Supporting Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com