Implementing ML models into production at Statistics Canada
Data Science has arrived at Statistics Canada. Richard Evans explains how it happened.
Statistics Canada (Statcan) is a branch of the federal government and is Canada’s national statistical organization. It’s responsible for producing the country’s key economic and social indicators, such as the census of population, the unemployment and inflation rates, measures of health outcomes, etc. The arrival of big data and AI present undreamed-of opportunities for Statcan’s statisticians to produce vastly more accurate, detailed, and relevant statistics than ever before. Although having many advantages which made it well placed to leverage these opportunities (such as operating under a powerful legal framework, having a strong information management culture and practices, an illustrious history of innovation, and deep expertise in data processing), the fact Statcan was focused primarily on processing surveys (i.e., "small data”) posed certain challenges.
Join Richard to see how Statcan transitioned from longstanding, large, predictable survey programs employing data processing methods that were perfected in the ’90s and ’00s to Agile teams deploying ML models running on vastly larger un/structured datasets, often embedded into legacy data processing structures. The transition to developing the necessary data science capacity to process large un/structured datasets led to innovations in many areas, including:
- HR: How to attract data science talent, manage them, keep them motivated, and retain them as well as the role of branding and autonomy
- Cultural: How to gain acceptance within the organization and develop a more nimble and responsive culture (Lean Startup culture)
- Policy-related: Who should write mission-critical code? Who should develop models? Who should vet them?
- Organizational: Where did such a unit belong? (The centralized/decentralized dilemma)
- IT: Moving to open source and the cloud
- Leadership: What kind of leader should a data science unit have? How should it be led?
- Impact: How should results be measured? (Key performance metrics)
Innovations in all the above areas led to the successful creation of a data science team that has gone from 4 to more than 54 use cases in various stages of completion in the span of a year.
Richard Evans is a 28-year veteran at Statistics Canada, Institut national de la statistique et des études économiques (Insee). He’s an expert in high-frequency economic indicators, a transformative leader, and an architect and project executive of the CPI Enhancement Initiative. Richard is passionate about using data science and AI to create user-centric data products from big data sources and is a recruiter of tomorrow’s statistical leaders.
Comments on this page are now closed.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires