Fueling innovative software

July 15-18, 2019
Portland, OR

Add to Your Schedule

Big data for the small fry

Mike Lutz (Samtec)

5:05pm–5:45pm Thursday, July 18, 2019

The Next Architecture
Location: E143/144

Average rating:

(4.50, 2 ratings)

Who is this presentation for?

IT and data people in smaller non-data-focused companies

Level

Beginner

Description

Over the last five years, there’s been a loud drumbeat announcing that big data is changing everything—but to all the normal folks, the people who don’t have data as their primary product, when they look at the technologies that make up the traditional big data suite, they find them so incomprehensibly different that they seem nearly alien in nature. The normal folks needed something to bridge the technological gap to get into big data, something that felt like normal enterprise data and ETL tools but that could, if needed, scale, interact with, and/or be pushed out to the cloud. That bridge can be made from a very unexpected tool, the Jupyter Notebook.

A few months ago Netflix started posting blog posts about what appeared to be the misuse of a familiar tool: Jupyter Notebook—the CS equivalent of a printing calculator. Instead of simply thinking of Jupyter as an interactive programing tool, what if, in addition to the interactive aspects of Jupyter, you took finished notebooks and had a tool that would let you run them noninteractively while providing parameterized inputs. That upside-down use of a notebook transforms them from an interactive programming environment to a self-documenting ETL tool. Netflix further pointed out that if you have cloud-based glue and scheduling systems (something the company has built internally but hasn’t publicly released), you then can scale the system as well.

Mike Lutz explains how Samtec (a midsize manufacturing company) read this and was thrilled with this solution—it was a way it could jump its Python-ETL-writing developers directly into the cloud. Except for one problem. Netflix didn’t offer how a small company would do the glue and scheduling. Mike details the open source infrastructure Samtec assembled to fill the gaps in the Netflix Jupyter system in order to make to work for small groups using Jupyter/JupyterHub, nteract(Netflix) papermill, Apache Airflow, Docker (optionally Kubernetes), a cloud data service (S3), and cloud compute/VPN services AWS, EC2, and VPN.

Prerequisite knowledge

A basic understanding of your company's data
Experience with scripting languages (e.g., Python)

What you'll learn

Understand how any size company can start leveraging big data and AI using open source tools and how even small and non-data-centric groups can get into big data via Jupyter today
Learn how Jupyter + Papermill + Airflow + cloud CPU and storage help bridge gap to big data

Mike Lutz

Samtec

Mike Lutz is an infrastructure lead at Samtec. Traditionally living in the data communications world, he stumbled into data (and big data) as a way to manage the floods of information that were being generated in his many telemetry and internet of things adventures.

Website

Comments on this page are now closed.

Comments

Mike Lutz | Infrastructure Lead

07/18/2019 8:36am PDT

Links for topics covered in talk:

AI hierarchy of needs: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007
Netflix Beyond Interactive: https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
Repo for this talk: https://github.com/samtecspg/small-fry
Signup for our newsletter: https://upscri.be/004efd

Mike Lutz | Infrastructure Lead

06/06/2019 11:15pm PDT

If you have any questions about the session this is a good place to ask.

If you would like to get some extra background in the technologies I’m going to talk about, here are a few other sessions I see on the schedule that look like they might help:

Premier Diamond Sponsor

Diamond Sponsors

Platinum Sponsor

Gold Sponsors

Silver Sponsors

Supporting Sponsors

Premier Exhibitors

Exhibitors

Innovators

Non-Profit Exhibitors

Diversity and Inclusion Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email oscon@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of OSCON contacts

©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com