Deep learning is fast becoming a crucial aspect of businesses. Since deep learning systems often have higher sensitivity to data cleanliness, having clean, high-quality training data is of increasing importance. When organizations first begin with machine learning systems, their data needs are often adequately served by third-party platforms like Mechanical Turk or Crowdflower. However, as these systems grow in complexity or the scale of problems changes, the needs of development (or operational teams) outgrow these off-the-shelf solutions.
Pinterest started using human evaluation the way most companies typically start: the company outsourced template functionality, job creation, and task completion to third-party companies and their worker pools, relying on these companies to provide high-quality crowds and trustworthy quality controls. After following this path for two years, Pinterest decided to pull much of this process in-house after it identified a growing spammer community among the crowds it relied on and after it had to introduce more and more third-party companies to fulfill the unique template demands of its team.
In 2016, Pinterest launched its own internal human evaluation platform, Sofia, that allowed the company to leverage its own engineers to build advanced template functionality and take worker quality matters into its own hands. Since the move, Pinterest has seen a significant increase in worker quality and in trust in human evaluation results along with a decrease in costs.
Veronica Mapes and Garner Chung offer an overview of Sofia. Along the way, they cover tricks for increasing data reliability and judgement reproducibility and explain how Pinterest integrated end-user-sourced judgements into its in-house platform. You’ll learn what human evaluation is and its major applications; the lessons Pinterest learned building its human evaluation platform; and where the field of human evaluation is going in the tech world. You’ll also discover the challenges Pinterest faced while training internal teams to write effective human-computation tasks and factors to consider when building out in-house human-computation resources.
Veronica Mapes is a technical program manager focused on human evaluation and computation at Pinterest, where she manages Pinterest’s internal human evaluation platform, maturing it from just an idea to a self-service platform with a 10 million annual run rate of tasks less than six months after launch, as well as third-party communities of crowdsourcing raters. She also hires, trains, and manages high-quality content evaluators and tests template and worker quality to ensure the delivery of highly accurate data for time series measurement and training machine learning models.
Garner Chung is the engineering manager of the human computation team and the data science team supporting core product, growth, and infrastructure at Pinterest. Previously, he managed the data science team at Opower, where he drove efforts to research and productionize predictive models for all of product and engineering. Many years ago, he studied film at UC Berkeley, where he learned to deconstruct and complicate misleadingly simple narratives. Over the course of his 20 years in the tech industry, he has witnessed exuberance over technology’s great promise ebb and flow, all the while remaining steadfast in his gratitude for having played some small part. As a leader, Garner has learned to drive teams that privilege responsibility and end-to-end ownership over arbitrary commitments.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org