Presented By O'Reilly and Cloudera
Make Data Work
Dec 4–5, 2017: Training
Dec 5–7, 2017: Tutorials & Conference
Singapore

Data science at team scale: Considerations for sharing, collaborating, and getting to production

Thomas Dinsmore (DataRobot), Johnson POH (DBS)
4:15pm4:55pm Thursday, December 7, 2017
Average rating: ****.
(4.33, 3 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and enterprise architects

Prerequisite knowledge

  • Experience as a practicing data scientist, preferably using either Python or R, a data engineer responsible for putting models into production, or an architect charged with supporting data scientists with a Hadoop-based platform

What you'll learn

  • Understand common problems to look out for when collaborating with other data scientists on big data platforms
  • Learn best practices and tools to overcome those challenges

Description

As children, we are taught that sharing is caring. As data scientists, success often requires building on the work of others and ensuring others can build on your work. This means being able to easily find datasets, projects, and models and being able to hand off your own project to another data scientist or a data engineer who can help operationalize your latest model. However, this is easier said than done.

Differing Python and R environments, multiple algorithm implementations, “unscalable” machine learning, and novel libraries (e.g., deep learning) not yet approved by IT are only a few of the challenges customers encounter on a regular basis. When coupled with the security requirements of regulated industry, it can be incredibly difficult to share data and analysis, let alone reproduce or deploy machine learning in the enterprise. As a result, many organizations struggle to build a scalable data science practice.

Thomas Dinsmore and Johnson Poh share common technology considerations and patterns for collaboration between data scientists, data engineers, and the business teams they support and best practices for moving machine learning into production at scale.

Photo of Thomas Dinsmore

Thomas Dinsmore

DataRobot

Thomas W. Dinsmore is a Senior Director at DataRobot. Previously, he served as Director of Product Marketing for Cloudera Data Science; as a Knowledge Expert on the Strategic Analytics team at the Boston Consulting Group; Director of Product Management for Revolution Analytics; and in consulting roles at IBM Big Data Solutions, SAS, PricewaterhouseCoopers, and Oliver Wyman. Thomas has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citibank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, and Vodafone. His international experience includes work for clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, the United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia, and Singapore.

Photo of Johnson POH

Johnson POH

DBS

Johnson Poh heads the data science practice at DBS’s Big Data Analytics Center of Excellence, where he drives the development of core data science capabilities for enhancing decision analysis. He spent the past decade leading teams in applying statistical learning models across government, pharmaceutical, and financial industries. Johnson holds a postgraduate degree in statistical computing from Yale University and bachelor degrees in mathematics, statistics, and economics from UC Berkeley.