How much data engineering should a data scientist know? For a data scientist to get to the fun part of their job, they normally have to do a bit of data engineering—in most cases, 50%–80% of their time is spent onboarding or wrangling data. Then it gets handed over to the data engineering team to put it into production (via dev, test, and QA). However, in most cases, the data engineering team will have to do some modifications, rewrites, head shaking, and hand wringing to make the code production ready and meet the SLAs defined by the business, as there is a disconnect in how data scientists and data engineers develop code and models.
Stephen O’Sullivan takes you along the data science journey, from onboarding data (using a number of data/object stores) to understanding and choosing the right data format for the data assets to using query engines (and basic query tuning). You’ll learn how a distributed streaming platform works and how to take advantage of it and explore good coding practices. Along the way, you’ll learn some new skills to help you be more productive and reduce contention with the data engineering team.
A leading expert on big data architectures, Stephen O’Sullivan has 25 years of experience creating scalable, high-availability data and applications solutions. A veteran of Silicon Valley Data Science, @WalmartLabs, Sun, and Yahoo. Stephen is an independent adviser to enterprises on all things data..
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com