Sep 23–26, 2019
Please log in

Data science versus engineering: Does it really have to be this way?

Ann Spencer (Domino), Amy Heineike (Primer), Paco Nathan (derwen.ai), Chris Wiggins (NYT | Columbia)
2:05pm2:45pm Wednesday, September 25, 2019
Location: 1A 08/10
Secondary topics:  Culture and Organization
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data scientists, machine learning engineers, researchers, data engineers, and data product managers

Level

Intermediate

Description

Collaboration between data science and engineering is a known challenge. This challenge has the potential to stymie innovation and hobble the acceleration of data science work. We in data science can just shrug our shoulders and accept that this is just the way it is or that it’s too hard of a problem to solve and decide to solve something else. Yet data science is grounded in the idea of solving for previously unsolvable problems. We’ve all heard stories from brilliant data scientists and exceptional engineers of their frustrations regarding collaboration around developing and deploying models. This is not an insurmountable problem.

Paco Nathan, Amy Heineike, and Chris Wiggins explore differing perspectives about collaboration when building and deploying models. Just a few topics candidly discussed during the panel will include potential tension points that arise (i.e., potentially stemming from a sense of ownership over workflows, the sheer amount and variety of work involved, differing expectations about process, etc.), problem solving to address tension points (i.e., mindset, communication best practices, cross training, etc.) and hopeful reflections on the potential future state.

Panelists

Paco Nathan, is known as a “player/coach,” with core expertise in data science, natural language processing, machine learning, and cloud computing. He’s the evil mad scientist at derwen, cochair of the Rev conference, advisor for Amplify, Deep Learning Analytics, Recognai, Data Spartan, and Primer.

Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG), and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.

Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Systems Biology, and is affiliated faculty in Statistics.

Moderator: Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly, focusing on data science and data engineering. It was in this role where she met and worked with the panelists.

Prerequisite knowledge

  • Familiarity with model development and model deployment

What you'll learn

  • Be able to identify specific common tension points that arise when building data- and ML-driven products
  • Understand the why behind the tension points to enable problem-solving for tension points
  • Gain practical advice for for addressing the tension points, including building and hardening communication lines between different functional roles either through pairing, product prototyping (i.e., "Wizard of Ozzing"), and tech talks and weekly seminars, as well as working toward interdisciplinary understanding through cross-functional work or cross-training
Photo of Ann Spencer

Ann Spencer

Domino

Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly (2012–2014), focusing on data science and data engineering.

Photo of Amy Heineike

Amy Heineike

Primer

Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG_, and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.

Photo of Paco Nathan

Paco Nathan

derwen.ai

Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.

Photo of Chris Wiggins

Chris Wiggins

NYT | Columbia

Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Systems Biology, and is affiliated faculty in Statistics.

Comments on this page are now closed.

Comments

Picture of Ann Spencer
Ann Spencer | Head of Content
10/09/2019 11:53pm EDT

Hi Anushka,

We used a single slide for the talk. It included our names, pictures, and affiliations. We did not use any other slides.

If you missed the panel session, there is an older blog post that may help. It is located here: https://blog.dominodatalab.com/data-science-vs-engineering-tension-points/.

Ann

Anushka Jadhav | sr software engineer
10/09/2019 4:34pm EDT

Hi, can you please post the slides for this talk

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires