Today’s newspaper publishers face challenges that would never even have occurred to their predecessors. Indeed, in today’s world of online publishing, editors must constantly monitor the traffic on their newspaper sites and make quick decisions about the content of these publications. Promotional platforms such as search engines and social media make the task all the more complex: an editor is faced with the challenge of taking in a vast amount of information in order to predict exactly how a piece of content will affect website traffic and, thereby, deciding which articles should be promoted from one minute to the next. Even by publishing standards, this is a daunting task.
Like many publishers, News UK, the publisher of The Sun, The Times and The Sunday Times, found that this posed a major challenge; they felt they could do better. The organisation made a decision to use their data to their advantage, choosing to focus on analysing article content. Previously content science had received less attention than customer science, however News UK believed that an analysis of article content presented an application of data science that might reveal insights that could be valuable to the business. In this presentation, we will describe a series of partnerships between Pivigo and News UK in which teams of data scientists-in-training embarked on five-week data science projects aimed at using advanced analytics and machine learning methodologies to generate insight and help News UK make better-informed decisions.
To better understand how differences in content affect article performance, one team examined what determines the lifetime of an article. In other words, they wanted to understand what factors determine how much traffic an article generates, and what the trajectory of that article’s popularity might look like over time. The team found that some articles produce a lot of web traffic very quickly after publication, but lose popularity within a few hours. Other articles, in contrast, gain popularity slowly but continue to generate website traffic long after their publication. By understanding which factors determined these trajectories, the News UK editors could make accurate predictions about the traffic pattern produced by a given article and could, therefore, make better-informed decisions about which articles to promote on social media.
To complement this work, a second team investigated how an article’s content affects its popularity and how different types of articles respond to promotion on social media. Their initial studies generated some clues from meta-data – items such as section of the newspaper, time of promotion and the day of the week. To improve upon this, the team turned to natural language processing (NLP) techniques. Their analysis revealed patterns in the ways that different aspects of an article’s text, such as named entities, topics and sentiment, combine to produce different levels of readership response. Moreover, they found that this response differed on different social medial platforms: for example, promotion on Facebook produced a different response to promotion on Twitter. Thus, the team created a bespoke analytics tool that editors could use to make accurate predictions about which articles might respond best to promotion on social media and which social media platform would likely produce the greatest effect.
During these studies, it occurred to the data team at News UK that they periodically observed deviations from normal patterns of website traffic. While their traditional analytics tools were able to examine these, these were cumbersome to operate and fell short in explaining these deviations in terms that a non-specialist could use; they needed a better way to detect and explain these deviations. Again, they turned to data science and a third team of Pivigo data scientist trainees. On this occasion, the team build an anomaly detection system designed to examine time series data generated from digital clickstreams, identify abnormal trends and explain anomalies using natural language. Early tests of this system showed that it could capture anomalous spikes in traffic due to atypical popularity of some content, allowing editors to quickly understand which topics were trending at higher-than-expected levels. Additionally, the team also found that this tool has another, unexpected function: it can flag errors in the data tracking in a matter of seconds, often before any built-in warnings could respond. This system gives the editorial team at News UK a powerful tool that can help them to quickly understand the impact of their content and make decisions in a far more agile way than previously possible.
Today the outputs of the three projects are making their way into common usage across the newsrooms alongside other algorithms developed in-house by the much larger data science team, and are helping complement the expertise of editors and journalists and aid decision making. This work underscores the power of data science, and shows how even a small team of data scientists can bring real value to a large, data-rich organisation in a short period of time.
A former Data Science Consultant who has worked with several publishers in the UK and US.
My work involves a combination of consultancy and hands-on work. As a consultant, I work with business partners to develop data science solutions that make the most of their data, including in-depth analysis of existing data and predictive analytics for future business needs. My hands-on work includes programming, mentoring and managing teams of data scientists on projects in a wide variety of business domains.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org