Data science versus engineering: Does it really have to be this way?
Who is this presentation for?
- Data scientists, machine learning engineers, researchers, data engineers, and data product managers
Level
Description
Collaboration between data science and engineering is a known challenge. This challenge has the potential to stymie innovation and hobble the acceleration of data science work. We in data science can just shrug our shoulders and accept that this is just the way it is or that it’s too hard of a problem to solve and decide to solve something else. Yet data science is grounded in the idea of solving for previously unsolvable problems. We’ve all heard stories from brilliant data scientists and exceptional engineers of their frustrations regarding collaboration around developing and deploying models. This is not an insurmountable problem.
Paco Nathan, Amy Heineike, and Chris Wiggins explore differing perspectives about collaboration when building and deploying models. Just a few topics candidly discussed during the panel will include potential tension points that arise (i.e., potentially stemming from a sense of ownership over workflows, the sheer amount and variety of work involved, differing expectations about process, etc.), problem solving to address tension points (i.e., mindset, communication best practices, cross training, etc.) and hopeful reflections on the potential future state.
Panelists
Paco Nathan, is known as a “player/coach,” with core expertise in data science, natural language processing, machine learning, and cloud computing. He’s the evil mad scientist at derwen, cochair of the Rev conference, advisor for Amplify, Deep Learning Analytics, Recognai, Data Spartan, and Primer.
Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG), and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.
Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Systems Biology, and is affiliated faculty in Statistics.
Moderator: Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly, focusing on data science and data engineering. It was in this role where she met and worked with the panelists.
Prerequisite knowledge
- Familiarity with model development and model deployment
What you'll learn
- Be able to identify specific common tension points that arise when building data- and ML-driven products
- Understand the why behind the tension points to enable problem-solving for tension points
- Gain practical advice for for addressing the tension points, including building and hardening communication lines between different functional roles either through pairing, product prototyping (i.e., "Wizard of Ozzing"), and tech talks and weekly seminars, as well as working toward interdisciplinary understanding through cross-functional work or cross-training
Ann Spencer
Domino
Ann Spencer is the head of content at Domino. She’s responsible for ensuring Domino’s data science content provides a high degree of value, density, and analytical rigor that sparks respectful candid public discourse from multiple perspectives, discourse that’s anchored in the intention of helping accelerate data science work. Previously, she was the data editor at O’Reilly (2012–2014), focusing on data science and data engineering.
Amy Heineike
Primer
Amy Heineike is the vice president of product engineering at Primer, where she leads teams to build machines that read and write text leveraging natural language processing (NLP), natural language generation (NLG_, and a host of other algorithms to augment human analysts. Previously, she built out technology for visualizing large document sets as network maps at Quid. A Cambridge mathematician who previously worked in London modeling cities, Amy is fascinated by complex human systems and the algorithms and data that help us understand them.
Paco Nathan
derwen.ai
Paco Nathan is known as a “player/coach” with core expertise in data science, natural language processing, machine learning, and cloud computing. He has 35+ years of experience in the tech industry, at companies ranging from Bell Labs to early-stage startups. His recent roles include director of the Learning Group at O’Reilly and director of community evangelism at Databricks and Apache Spark. Paco is the cochair of Rev conference and an advisor for Amplify Partners, Deep Learning Analytics, Recognai, and Primer. He was named one of the "top 30 people in big data and analytics" in 2015 by Innovation Enterprise.
Chris Wiggins
NYT | Columbia
Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Systems Biology, and is affiliated faculty in Statistics.
Comments on this page are now closed.
Presented by
Elite Sponsors
Strategic Sponsors
Zettabyte Sponsors
Contributing Sponsors
Exabyte Sponsors
Content Sponsor
Impact Sponsors
Supporting Sponsor
Non Profit
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
strataconf@oreilly.com
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires
Comments
Hi Anushka,
We used a single slide for the talk. It included our names, pictures, and affiliations. We did not use any other slides.
If you missed the panel session, there is an older blog post that may help. It is located here: https://blog.dominodatalab.com/data-science-vs-engineering-tension-points/.
Ann
Hi, can you please post the slides for this talk