Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Data Science in an Agile Environment: Methods and Organization for Success

Sam Helmich (Deere & Company)
10:00am–10:30am Tuesday, 09/11/2018
Data-driven business management
Location: 1E 10 Level: Non-technical
Secondary topics:  Machine Learning in the enterprise

Who is this presentation for?

Manager, Data Scientist, Data Wrangler, Data Catalyst

Prerequisite knowledge

None.

What you'll learn

How to work with Agile and how to structure a team for effective data science work.

Description

Data science continues to mature as a discipline, and as such, there exist a wide array of processes and structures for working as a practitioner within a business. By borrowing concepts from Agile and with the appropriate mindset around team organization, data science can deliver better insights at a faster pace than older, more traditional methods.

Agile has a few basic building blocks that lend itself well to practicing data science, chief among them being rapid iteration, frequent stakeholder contact, and transparency of the work being done. Creating work plans with short cycles allows practitioners to fail and succeed quickly, which can inform how to approach data structures and modelling procedures that can provide the most benefit. Iteration also helps focus practitioners on high-value low-effort items that can make projects successful as early as possible, but can also make the high-value high-effort items more approachable by breaking them down into easily digestible chunks. Additionally, iteration within the team allows for quick identification and resolution of roadblocks, so more time can be spent on the actual development. Those short cycles provide opportunities to share progress with stakeholders who are outside of the agile team.

The agile ethos of being transparent about who is doing what work leads to team structure as another critical element for success. Frequently, organizations attempt to find “unicorns” who are business domain experts, as well as experts at data manipulation and data science, but can often be disappointed when the global pool of these unicorns is extremely small. At John Deere, we’ve found it to be incredibly effective to structure data science teams in groups of three: One who is an expert at data manipulation (Data Wrangler), an expert in modelling (Data Scientist), and an expert communicator and business interface who knows the problem domain and the stakeholder’s shared interest. (Data Catalyst). It is far easier to identify people who greatly excel in one of the three areas than to find one person who can handle all three.

Organizing the team in groups of three coupled with agile principles enables quicker cycles through data processing steps and modelling iterations. The outcomes allow for identification of useful data and modelling concepts, and concurrent development of the data environment. Iterating in communication helps keep the team aligned with customer needs and allows the customer to have an active voice in the project.

Photo of Sam Helmich

Sam Helmich

Deere & Company

Sam Helmich is a Data Scientist in the John Deere’s Intelligent Solutions Group. He has worked in applied analytics roles within John Deere Worldwide Parts and Global Order Fulfillment, and has a MS in Statistics from Iowa State University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)