Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK
Please log in

Learning how to perform ETL data migrations with open source tool Embulk

Jason Bell (Independent Speaker)
14:5515:35 Thursday, 2 May 2019
Data Engineering and Architecture
Location: Capital Suite 10/11
Average rating: *****
(5.00, 1 rating)

Who is this presentation for?

  • Data engineers and anyone involved in data migration

Level

Beginner

What you'll learn

  • Understand Embulk's capabilities and how it could help a data engineer with data migration

Description

One of the common data engineering tasks is finding a way to get data from one location to another. This might be data files into a database or a relational database store into a data warehouse.

Jason Bell introduces you to Embulk, an open source ETL tool designed to help engineers import and export data.

Topics include:

  • Embulk installation
  • An overview of the input and output plug-ins as well as how to install them
  • The YAML files Embulk uses for configuration
  • Handling incremental updates of data
  • Examples of common scenarios: File to database, database to data warehouse, etc.
Photo of Jason Bell

Jason Bell

Independent Speaker

Jason Bell specializes in high-volume streaming systems for large retail customers, using Kafka in a commercial context for the last five years. Jason was section editor for Java Developer’s Journal, has contributed to IBM developerWorks on autonomic computing, and is the author of Machine Learning: Hands On for Developers and Technical Professionals.