Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Data science in an Agile environment: Methods and organization for success

Sam Helmich (Deere & Company)
10:00am–10:30am Tuesday, 09/11/2018
Data-driven business management
Location: 1E 10 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise

Who is this presentation for?

  • Managers, data scientists, data wranglers, and data catalysts

What you'll learn

  • Learn how to work with Agile for data science and how to structure a team for effective data science work


Data science continues to mature as a discipline, and as such, there exist a wide array of processes and structures for working as a practitioner within a business. Sam Helmich explains how data science can benefit from borrowing Agile principles and the appropriate mindset around team organization, enabling you to deliver better insights at a faster pace than older, more traditional methods.

Agile has a few basic building blocks that lend itself well to practicing data science, chief among them being rapid iteration, frequent stakeholder contact, and transparency of the work being done. Creating work plans with short cycles allows practitioners to fail and succeed quickly, which can inform how to approach data structures and modeling procedures that can provide the most benefit. Iteration also helps focus practitioners on high-value, low-effort items that can make projects successful as early as possible but can also make the high-value, high-effort items more approachable by breaking them down into easily digestible chunks. Additionally, iteration within the team allows for quick identification and resolution of roadblocks, so more time can be spent on the actual development. Those short cycles provide opportunities to share progress with stakeholders who are outside of the Agile team.

The Agile ethos of being transparent about who is doing what work leads to team structure as another critical element for success. Frequently, organizations attempt to find “unicorns” who are business domain experts as well as experts at data manipulation and data science, but can often be disappointed when the global pool of these unicorns is extremely small. John Deere, for instance, has found it to be incredibly effective to structure data science teams in groups of three: an expert at data manipulation (a data wrangler), an expert in modeling (a data scientist), and an expert communicator and business interface who knows the problem domain and the stakeholder’s shared interest (a data catalyst). It’s far easier to identify people who greatly excel in one of the three areas than to find one person who can handle all three.

These teams coupled with Agile principles can enable quicker cycles through data processing steps and modeling iterations. The outcomes allow for identification of useful data and modeling concepts and concurrent development of the data environment. Iterating in communication helps keep the team aligned with customer needs and allows the customer to have an active voice in the project.

Photo of Sam Helmich

Sam Helmich

Deere & Company

Sam Helmich is a data scientist in John Deere’s Intelligent Solutions Group. Previously, he worked in applied analytics roles within John Deere Worldwide Parts and Global Order Fulfillment. Sam holds an MS in statistics from Iowa State University.