Sep 23–26, 2019

Data science versus engineering: Does it really have to be this way?

Ann Spencer (Domino), Paco Nathan (Derwen), Amy Heineike (Primer), Pete Warden (TensorFlow)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 08/10
Secondary topics:  Culture and Organization

Who is this presentation for?

  • Data scientists, machine learning engineers, researchers, data engineers, and data product managers

Level

Intermediate

Description

Collaboration between data science and engineering is a known challenge. This challenge has the potential to stymie innovation and hobble the acceleration of data science work. We in data science can just shrug our shoulders and accept that this is just the way it is or that it’s too hard of a problem to solve and decide to solve something else. Yet data science is grounded in the idea of solving for previously unsolvable problems. We’ve all heard stories from brilliant data scientists and exceptional engineers of their frustrations regarding collaboration around developing and deploying models. This is not an insurmountable problem.

Paco Nathan, Amy Heineike, and Pete Warden explore differing perspectives about collaboration when building and deploying models. Just a few topics candidly discussed during the panel will include potential tension points that arise (i.e., potentially stemming from a sense of ownership over workflows, the sheer amount and variety of work involved, differing expectations about process, etc.), problem solving to address tension points (i.e., mindset, communication best practices, cross training, etc.) and hopeful reflections on the potential future state.

Panelists

Paco Nathan, is known as a “player/coach,” with core expertise in data science, natural language processing, machine learning, and cloud computing. He’s the evil mad scientist at derwen, cochair of the Rev conference, advisor for Amplify, Deep Learning Analytics, Recognai, Data Spartan, and Primer.

Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG), and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.

Pete Warden is the technical lead on the TensorFlow mobile embedded team at Google doing deep learning. Previously, he was CTO of Jetpac, which was acquired by Google, and worked on GPU optimizations for image processing at Apple. He’s written several books on data processing for O’Reilly and blogs at petewarden.com.

Moderator: Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly, focusing on data science and data engineering. It was in this role where she met and worked with the panelists.

Prerequisite knowledge

  • Familiarity with model development and model deployment

What you'll learn

  • Be able to identify specific common tension points that arise when building data- and ML-driven products
  • Understand the why behind the tension points to enable problem-solving for tension points
  • Gain practical advice for for addressing the tension points, including building and hardening communication lines between different functional roles either through pairing, product prototyping (i.e., "Wizard of Ozzing"), and tech talks and weekly seminars, as well as working toward interdisciplinary understanding through cross-functional work or cross-training
Photo of Ann Spencer

Ann Spencer

Domino

Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly (2012–2014), focusing on data science and data engineering.

Photo of Paco Nathan

Paco Nathan

Derwen

Paco Nathan is known as a player/coach, with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years tech industry experience, ranging from Bell Labs to early-stage startups. He’s co-chair for Rev conf, former co-chair for JupyterCon. Advisor for NYU Coleridge Initiative, IBM Data Science Community, Amplify Partners, Recognai, Primer AI, and Data Spartan. Cited in 2015 as one of the Top 30 People in Big Data and Analytics by Innovation Enterprise. Co-author of the upcoming Rich Search and Discovery for Research Datasets.

Photo of Amy Heineike

Amy Heineike

Primer

Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG_, and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.

Photo of Pete Warden

Pete Warden

TensorFlow

Pete Warden is the technical lead on the TensorFlow mobile embedded team at Google doing deep learning. Previously, he was CTO of Jetpac, which was acquired by Google, and worked on GPU optimizations for image processing at Apple. He’s written several books on data processing for O’Reilly and blogs at petewarden.com.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

confreg@oreilly.com

For conference registration information and customer service

partners@oreilly.com

For more information on community discounts and trade opportunities with O’Reilly conferences

strataconf@oreilly.com

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts