Presented By O’Reilly and Cloudera
Make Data Work
March 5–6, 2018: Training
March 6–8, 2018: Tutorials & Conference
San Jose, CA

Executive Briefing: Building effective heterogeneous data communities—Driving organizational outcomes with broad-based data science

Frances Haugen (Pinterest), Patrick Phelps (Pinterest)
1:50pm2:30pm Wednesday, March 7, 2018
Average rating: ****.
(4.67, 3 ratings)

Although data science as a community of practice is still quite new, in many organizations, the role of extracting analytical insights has already been segregated into discrete teams of data scientists and analysts. The reasons for this are manyfold. Most data sources have gotchas lurking in their semimaintained midsts that can make a seemingly straightforward analysis inaccurate. Applying the appropriate statistical techniques requires more nuance than most people received in their undergraduate statistics courses, and the real patterns of the internet and many commercial systems (specifically long tail patterns) are at best only cursorily covered. Perhaps the most important reason for this isolation is in reality social: once those pitfalls are known, any time someone outside of the data science team does an analysis, the only way to navigate the credibility gap is to ask to review someone’s findings, an act that is viewed by many as confrontational and thus avoided.

This is unfortunate because all great data science findings begin with hunches, and those hunches are born of domain expertise. While data science teams may know their statistics better than other teams within an organization, they may not understand as deeply a business’s customers, operations, or product dynamics. Traditionally, the gap is bridged by customers of the data science team providing scoped questions to be investigated—the best data science managers are effective interviewers and teasers of intent who extract enough information that their scientists and analysts can dig into a problem space. While this often is a sufficient stopgap, allowing good-enough findings to be produced to justify the exists of a team of analytical specialists, it ends up leaving a huge amount of headroom on the table in terms of business-outcome driving insights. The best analyses come not from the first question asked but often the third or fourth that an investigator stumbles on as they dig into an analysis, finding ways for the final consumers of quantitative findings to produces their own analysis provides a path to unlock those results.

Frances Haugen and Patrick Phelps share strategies and tools for building heterogeneous data organizations that allow a broader range of backgrounds to be effective producers of data-driven insights. Along the way, they explain how Jupyter notebooks can help overcome the social barriers to participation and why off-the-shelf visualization and data access tools often fall short of expectations (the flexibility versus usability problem). They also detail how Pinterest structured its company-wide data science course using a “reverse classroom” philosophy to empower people from backgrounds as diverse as product operations, engineering, product, and sales to be effective consumers (and producers) of quantitative information and outline a roadmap for how you can design an organizational change plan for helping to make your team or organization more data centric. The data is yours. Will you choose to unlock it?

Photo of Frances Haugen

Frances Haugen


Frances Haugen is a data product manager at Pinterest focusing on ranking content in the home feed and related pins and the challenges of driving immediate user engagement without harming the long-term health of the Pinterest content ecosystem. Previously, Frances worked at Google, where she founded the Google+ search team, built the first non-quality-based search experience at Google, and cofounded the Google Boston search team. She loves user-facing big data applications and finding ways to make mountains of information useful and delightful to the user. Frances was a member of the founding class of Olin College and holds a master’s degree from Harvard.

Photo of Patrick Phelps

Patrick Phelps


Patrick Phelps is the lead data scientist on ads at Pinterest, focusing on auction dynamics and advertiser success. Previously, Patrick was the lead data scientist at Yelp, leading a team focusing on projects as diverse as search, ads, delivery operations, and HR. He has an engineering background in traffic quality (the art of distinguishing automated systems and malicious actors from legitimate users across a variety of platforms) and held an Insight Data Science fellowship. Patrick is passionate about the ability of data to provide key, quantitative insights to businesses during the decision-making process and is an advocate for data science education across all layers of a company. Patrick holds a PhD in experimental high-energy particle astrophysics.