Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Data science at team scale: Considerations for sharing, collaborating, and getting to production

Tristan Zajonc (Cloudera), Thomas Dinsmore (DataRobot), Lucas Glass (QuintilesIMS)
1:15pm1:55pm Wednesday, September 27, 2017
Machine Learning & Data Science
Location: 1A 08/10 Level: Intermediate
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Data scientists, data engineers, and enterprise architects

Prerequisite knowledge

  • Experience as a practicing data scientist (preferably using either Python or R), a data engineer responsible for putting models into production, or an architect charged with supporting data scientists with a Hadoop-based platform

What you'll learn

  • Discover common problems to look out for when collaborating with other data scientists on big data platforms
  • Learn best practices and tools to overcome those challenges


As children, we were taught that sharing is caring. As data scientists, success often requires building on the work of others and ensuring others can build on your work. This means being able to easily find datasets, projects, and models. It also means being able to hand off your own project to another data scientist or to a data engineer who can help operationalize your latest model. However, this is easier said than done.

Differing Python/R environments, multiple algorithm implementations, “unscalable” machine learning, and novel libraries (e.g., deep learning) not yet approved by IT are only a few of the challenges customers encounter on a regular basis. When coupled with the security requirements of regulated industry, it can be incredibly difficult to share data and analysis, let alone reproduce or deploy machine learning in the enterprise. As a result, many organizations struggle to build a scalable data science practice.

Tristan Zajonc and Thomas Dinsmore discuss common technology considerations and patterns for collaboration in large teams and for moving machine learning into production at scale.

Photo of Tristan Zajonc

Tristan Zajonc


Tristan Zajonc is a senior engineering manager at Cloudera. Previously, he was cofounder and CEO of Sense, a visiting fellow at Harvard’s Institute for Quantitative Social Science, and a consultant at the World Bank. Tristan holds a PhD in public policy and an MPA in international development from Harvard and a BA in economics from Pomona College.

Photo of Thomas Dinsmore

Thomas Dinsmore


Thomas W. Dinsmore is a Senior Director at DataRobot. Previously, he served as Director of Product Marketing for Cloudera Data Science; as a Knowledge Expert on the Strategic Analytics team at the Boston Consulting Group; Director of Product Management for Revolution Analytics; and in consulting roles at IBM Big Data Solutions, SAS, PricewaterhouseCoopers, and Oliver Wyman. Thomas has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citibank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, and Vodafone. His international experience includes work for clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, the United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia, and Singapore.

Photo of Lucas Glass

Lucas Glass


Lucas Glass is the Global Analytics Lead within the Analytics Center of Excellence at QuintilesIMS. His teams build data science and artificial intelligence microservices to make the design, planning, and execution of clinical research more efficient. Prior to QuintilesIMS, Lucas worked on healthcare fraud analytics at the Department of Justice. He holds a masters in biostatistics from Drexel University and is a PhD candidate at Temple University in Statistics.