As children, we were taught that sharing is caring. As data scientists, success often requires building on the work of others and ensuring others can build on your work. This means being able to easily find datasets, projects, and models. It also means being able to hand off your own project to another data scientist or to a data engineer who can help operationalize your latest model. However, this is easier said than done.
Differing Python/R environments, multiple algorithm implementations, “unscalable” machine learning, and novel libraries (e.g., deep learning) not yet approved by IT are only a few of the challenges customers encounter on a regular basis. When coupled with the security requirements of regulated industry, it can be incredibly difficult to share data and analysis, let alone reproduce or deploy machine learning in the enterprise. As a result, many organizations struggle to build a scalable data science practice.
Tristan Zajonc and Thomas Dinsmore discuss common technology considerations and patterns for collaboration in large teams and for moving machine learning into production at scale.
Tristan Zajonc is a senior engineering manager at Cloudera. Previously, he was cofounder and CEO of Sense, a visiting fellow at Harvard’s Institute for Quantitative Social Science, and a consultant at the World Bank. Tristan holds a PhD in public policy and an MPA in international development from Harvard and a BA in economics from Pomona College.
Thomas W. Dinsmore is director of product marketing for Cloudera Data Science. Previously, he served as a knowledge expert on the strategic analytics team at the Boston Consulting Group; director of product management for Revolution Analytics; analytics solution architect at IBM Big Data Solutions; and a consultant at SAS, PricewaterhouseCoopers, and Oliver Wyman. Thomas has led or contributed to analytic solutions for more than five hundred clients across vertical markets and around the world, including AT&T, Banco Santander, Citibank, Dell, J.C.Penney, Monsanto, Morgan Stanley, Office Depot, Sony, Staples, United Health Group, UBS, and Vodafone. His international experience includes work for clients in the United States, Puerto Rico, Canada, Mexico, Venezuela, Brazil, Chile, the United Kingdom, Belgium, Spain, Italy, Turkey, Israel, Malaysia, and Singapore.
Lucas Glass is the Global Analytics Lead within the Analytics Center of Excellence at QuintilesIMS. His teams build data science and artificial intelligence microservices to make the design, planning, and execution of clinical research more efficient. Prior to QuintilesIMS, Lucas worked on healthcare fraud analytics at the Department of Justice. He holds a masters in biostatistics from Drexel University and is a PhD candidate at Temple University in Statistics.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com