Presented By O’Reilly and Cloudera

San Jose • London • New York

Make Data Work

March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Enough data engineering for a data scientist; or, How I learned to stop worrying and love the data scientists

Stephen O'Sullivan (Data Whisperers)

11:50am–12:30pm Thursday, March 8, 2018

Data science and machine learning
Location: LL20 C

Average rating:

(4.25, 4 ratings)

View slides

Who is this presentation for?

Data scientists and data scientists in training

Prerequisite knowledge

A basic understanding of SQL, data engineering design principles, core software engineering, and DevOps

What you'll learn

Gain an understanding of data engineering to improve productivity and the relationship between data scientists and data engineers

Description

How much data engineering should a data scientist know? For a data scientist to get to the fun part of their job, they normally have to do a bit of data engineering—in most cases, 50%–80% of their time is spent onboarding or wrangling data. Then it gets handed over to the data engineering team to put it into production (via dev, test, and QA). However, in most cases, the data engineering team will have to do some modifications, rewrites, head shaking, and hand wringing to make the code production ready and meet the SLAs defined by the business, as there is a disconnect in how data scientists and data engineers develop code and models.

Stephen O’Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You’ll learn how a distributed streaming platform works and how to take advantage of it and explore good coding practices. Along the way, you’ll learn some new skills to help you be more productive and reduce contention with the data engineering team.

Stephen O'Sullivan

Data Whisperers

A leading expert on big data architectures, Stephen O’Sullivan has 25 years of experience creating scalable, high-availability data and applications solutions. A veteran of Silicon Valley Data Science, @WalmartLabs, Sun, and Yahoo. Stephen is an independent adviser to enterprises on all things data..

Presented by

Elite Sponsors

Strategic Sponsors

Zettabyte Sponsor

Contributing Sponsors

Exabyte Sponsors

Impact Sponsors

Sponsorship Opportunities

For exhibition and sponsorship opportunities, email strataconf@oreilly.com

Partner Opportunities

For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com

Contact Us

View a complete list of Strata Data Conference contacts

©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com