In news publishing, the comment section has a really bad reputation. It’s often seen as the Wild West—a place where ugly, hateful comments drown out thoughtful, diverse criticism and discussions. At the Washington Post, the quality of online comments is incredibly important. The company wants to stimulate meaningful conversations while maintaining a civil and thoughtful comment section. A good comment section not only is beneficial for fostering close relationships with readers and improving user engagement and loyalty but also serves as a source for new story ideas for journalists.
Online news comments are casual free text generated by readers. Evaluating such content is not a trivial task. Historically, the moderation task was accomplished by humans. However, with the rapidly growing volume of online comments, the manual process consumes a vast amount of resources. Although much research has been done on automatically analyzing online comments, only a few studies have been conducted in journalism specifically. In addition, the criteria used for moderation vary a lot depending on the problem to be solved and the specific settings.
Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. ModBot contains a set of predictive models trained on tens of thousands of comments with human-moderated labels. Eui-Hong and Ling explain why they consider the problem of comments moderation as a classification task and why they engineered several additional features beyond besides bag-of-words extracted from the content. You’ll learn how they built the models, refined them based on moderation criteria, and deployed ModBot in the production environment for more efficient and economical comments moderation.
Eui-Hong (Sam) Han is the director of big data and personalization at the Washington Post. Sam is an experienced practitioner of data mining and machine learning and has an in-depth understanding of analytics technologies. He has successfully applied these technologies to solve real business problems. At the Washington Post, he leads a team building an integrated big data platform to store all aspects of customer profiles and activities from both digital and print circulation, content metadata, and business data. His team is building an infrastructure, tools, and services to provide personalized experience to customers, empower the newsroom with data for better decisions, and provide targeted advertising capability. Previously, he led the Big Data practice at Persistent Systems, started the Machine Learning Group in Sears Holdings’s online business unit, and worked for a data mining startup company. Sam’s expertise includes data mining, machine learning, information retrieval, and high-performance computing. He holds a PhD in computer science from the University of Minnesota.
Ling Jiang is a data scientist at the Washington Post, where she works on data mining and knowledge discovery from large volumes of data and has successfully built several data-powered products using machine learning and NLP techniques. Ling is skilled in using various machine learning and data mining techniques to tackle business problems. She holds a PhD in information science from Drexel University.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com