In the summer of 2017, Reuters.com embarked on an ambitious redesign of its article pages, specifically a scroll design in which articles that users request to read are immediately followed by related (or possibly unrelated) articles. The initial launch of the scroll model made recommendations based on content alone, independent of user behavior. Given the advantages of word and document embedding models and the particularities of Reuters.com content, the system was designed to use document vectors to to determine article similarity. Being unsupervised, document vectors need some supervised learning assistance if being used in a production system.
James Dreiss discusses the development of the supervised topic filtering model that sits on top of the document vector model, as well as additional filtering strategies. Measuring performance of word and document vectors is notoriously difficult, but some heuristics have been developed. James offers a brief overview of measuring word and document vector performance and explains how he ultimately tackled the problem. James also details how he tested a pet theory that users would want diversity in content, especially given the wall-to-wall coverage of certain subjects, such as Donald Trump, and shares the results of serving both similarly and dissimilarly related content to users. James concludes by covering the cookie-based personalization system that was later implemented for content recommendation on article scrolls, including test results comparing the two systems.
James Dreiss is a senior data scientist at Reuters. Previously, he worked at the Metropolitan Museum of Art in New York. He studied at New York University and the London School of Economics.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
thanks jay! looks like the slides are just available on this page
Hi James,
Caught your talk on the TWIML podcast and really enjoyed it. I Went to: https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/proceedings but your slides are not posted.
thanks! slides should be posted here: https://conferences.oreilly.com/strata/strata-eu-2018/public/schedule/proceedings
Nice talk! Was interesting to see how document vectors are used @ Reuters. Can you please share the slides?