Skip to main content

What Makes Us Human? A Tale of Advertising Fraud

Claudia Perlich (Dstillery)
Grand Ballroom
Average rating: ****.
(4.30, 23 ratings)

We’ve all taken road trips, right? So imagine driving for 24 hours straight and passing a billboard every three seconds. Now imagine someone hijacks your car, blindfolds you and ties you down in the passenger seat, and you proceed on your road trip, oblivious to that onslaught of billboard messages zooming past. That, my friends, is a peek into a disturbing phenomenon of forced website visitation. Allow me to explain.

Back in Spring 2012, shortly after releasing a modification to one of our models for targeting prospects in display advertising, we began seeing a sudden increase in one of our early diagnostic metrics. Historically, an increase in this metric indicated better performance of our production models. “Great!” we all thought. But the metric continued to increase, and increase, and increase to the point where performance appeared to have doubled in just two weeks. Uh-oh. This looked too good to be true – and one thing you learn when dealing with data, it if looks too good to be true – it almost always is.

We decided to look under the hood and a couple of things started to look really strange: 1) the most predictive URL’s all had only recently appeared in our data, 2) they were equally predictive no matter whether we are advertising for a hotel chain, pizza, or running shoes and 3) the co-visitation patters between the sites seemed excessive and very unnatural. Apparently there was a new breed of websites that are simply passing traffic around, sometimes at alarming rates, in order to monetize that traffic in the real-time bidding exchanges. In fact, one web viewer was involved in over 30,000 auctions in a single day. That is the equivalent of one advertisement every three seconds for a twenty four hour period – loosely akin to the aforementioned road trip from hell. These users sometimes “do” this (unbeknownst to them, of course) for a week or more at a time. They are typically shepherd from a Chinese movie site, to a new mother’s site, to a fashion site, back to the new mother’s site, to an auto site and many other sites with various content types throughout the day. In many situations, these browsing patterns cannot be explained by human behavior, and they are seen across websites which were created solely for purpose of fabricating specious visits, selling traffic to websites, and take a large share of advertising budgets, indiscriminately racing blindfolded browsers through the web and (not) exposing them to invisible brand messages along the way. The straw that finally broke our camel’s back (and made us notice this) is that not only are browsers traversing these networks of artificial sites, they are also occasionally send to YOUR brand page, increasing their advertising value by order of magnitude for retargeting and generating even higher auction prices in the subsequent round.

No matter what attitude you may have towards the advertising world and the obvious amoral nature of this business, consider this: if a large proportion of users who are coming to your site not only had no intentions of being there and no awareness that this happened to them either, what good is any insight from your web analytics?

Photo of Claudia Perlich

Claudia Perlich


Claudia Perlich serves as Chief Scientist at m6d and in this role designs, develops, analyzes and optimizes the machine learning that drives digital advertising to prospective customers of brands. An active industry speaker and frequent contributor to industry publications, Claudia enjoys acting as a guide in world of data and was recently named winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and was selected as member of the Crain’s NY annual 40 Under 40 list. She has published numerous scientific articles, and holds multiple patents in machine learning and won many data mining competitions. Prior to joining m6d in February 2010, Claudia worked in Data Analytics Research at IBM’s Watson Research Center, concentrating on data analytics and machine learning for complex real-world domains and applications. Claudia has a PhD in Information Systems from NYU and an MA in Computer Science from Colorado University. Claudia takes active interest in the making of the next generation of data scientists and is teaching “Data Mining for Business Intelligence” in the NYU Stern MBA program.


Sponsorship Opportunities

For exhibition and sponsorship opportunities, contact Susan Stewart at

Media Partner Opportunities

For information on trade opportunities with O'Reilly conferences email mediapartners

Press & Media

For media-related inquiries, contact Maureen Jennings at

Contact Us

View a complete list of Strata + Hadoop World 2013 contacts