Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Automatic comments moderation with ModBot at the Washington Post

Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
2:05pm2:45pm Wednesday, September 27, 2017
Secondary topics:  Media, Text
Average rating: ****.
(4.50, 2 ratings)

Who is this presentation for?

  • Data scientists and product managers in journalism

Prerequisite knowledge

  • A basic understanding of natural language processing, machine learning, and text mining

What you'll learn

  • Explore ModBot, a machine learning-based tool developed by the Washington Post for automatic comments moderation


In news publishing, the comment section has a really bad reputation. It’s often seen as the Wild West—a place where ugly, hateful comments drown out thoughtful, diverse criticism and discussions. At the Washington Post, the quality of online comments is incredibly important. The company wants to stimulate meaningful conversations while maintaining a civil and thoughtful comment section. A good comment section not only is beneficial for fostering close relationships with readers and improving user engagement and loyalty but also serves as a source for new story ideas for journalists.

Online news comments are casual free text generated by readers. Evaluating such content is not a trivial task. Historically, the moderation task was accomplished by humans. However, with the rapidly growing volume of online comments, the manual process consumes a vast amount of resources. Although much research has been done on automatically analyzing online comments, only a few studies have been conducted in journalism specifically. In addition, the criteria used for moderation vary a lot depending on the problem to be solved and the specific settings.

Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. ModBot contains a set of predictive models trained on tens of thousands of comments with human-moderated labels. Eui-Hong and Ling explain why they consider the problem of comments moderation as a classification task and why they engineered several additional features beyond besides bag-of-words extracted from the content. You’ll learn how they built the models, refined them based on moderation criteria, and deployed ModBot in the production environment for more efficient and economical comments moderation.

Photo of Eui-Hong Han

Eui-Hong Han

The Washington Post

Eui-Hong (Sam) Han is the director of big data and personalization at the Washington Post. Sam is an experienced practitioner of data mining and machine learning and has an in-depth understanding of analytics technologies. He has successfully applied these technologies to solve real business problems. At the Washington Post, he leads a team building an integrated big data platform to store all aspects of customer profiles and activities from both digital and print circulation, content metadata, and business data. His team is building an infrastructure, tools, and services to provide personalized experience to customers, empower the newsroom with data for better decisions, and provide targeted advertising capability. Previously, he led the Big Data practice at Persistent Systems, started the Machine Learning Group in Sears Holdings’s online business unit, and worked for a data mining startup company. Sam’s expertise includes data mining, machine learning, information retrieval, and high-performance computing. He holds a PhD in computer science from the University of Minnesota.

Photo of Ling Jiang

Ling Jiang

The Washington Post

Ling Jiang is a data scientist at the Washington Post, where she works on data mining and knowledge discovery from large volumes of data and has successfully built several data-powered products using machine learning and NLP techniques. Ling is skilled in using various machine learning and data mining techniques to tackle business problems. She holds a PhD in information science from Drexel University.