Brought to you by NumFOCUS Foundation and O’Reilly Media Inc.
The official Jupyter Conference
August 22-23, 2017: Training
August 23-25, 2017: Tutorials & Conference
New York, NY

Productionalizing Notebooks for ETL

Moderated by: Bill Walrond

Who is this presentation for?

Data Engineers

Prerequisite knowledge

Production Pipeline

What you'll learn

Notebooks are the new Integrated Development Enivronment (IDE)

Description

For as long as notebooks have been in existence, they have been viewed as tools used exclusively by data scientists. In this case study, Kevin Rasmussen, Solution Architect, will debunk that notion by giving a practical demonstration of how notebooks can be used for ETL (specifically for Spark applications). Contrary to what many believe, arranging for notebook code to function in a production capacity is not challenging – working in any cloud ecosystem. Rasmussen will prove this by presenting a case study from a current project with one of the most respected newspapers in the country working within the Google Cloud Platform.

In this particular case, the architect adopted the mindset: “We don’t run notebooks, we run Python,” by first converting the notebook to Python. The audience will get an inside look at each subsequent step that helped turn the notebook into one of the most versatile and critical elements of the data ecosystem. Other topics that will be covered include making just one application at the start of the job vs. giving every job its own application.