Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Using Data Science on Internet Search Behavior as a Proxy for Human Behavior

Juan Miguel Lavista (Microsoft)
5:05pm–5:45pm Thursday, 10/16/2014
Data Science
Location: 1D
Average rating: ***..
(3.43, 7 ratings)
Slides:   1-PPTX 

Just in the US, we make over ~40 billion queries every month. From the time we wake up, search engines are one of the top activities we do online, this talk will show some examples on how this data can be used.

In April 2014, Nate Silver posted a blog examining how much major US cities differ in terms of when they arrive at work. Using survey data collected by the American Community Survey, Silver showcased the average time people say they arrive at work across the country. Inspired by this experiment, we thought it would be interesting to look at Bing usage as a proxy for when people get online, start work or otherwise wake up. We found many similarities in the study, one that was clear, is that New York City, the city that never sleeps, actually wakes up 43 minutes later than San Francisco.

But search data can also be used for more important tasks, for example adverse drug events that cause mortality are often discovered after a drug comes to market. A team of scientist between Microsoft Research, Columbia, and Stanford University hypothesized that Internet users may provide early clues about adverse drug events via their online information-seeking. They conducted a large-scale study of Web search activity and found that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance.

Juan Miguel Lavista


Juan Lavista is a Principal Data Scientist for at Microsoft, where he works with a team of data scientist searching for insights in petabytes of data. Juan joined Microsoft to work for the Microsoft Experimentation Platform (EXP) where he designed and ran control experiments across different Microsoft properties. Before joining Microsoft, Juan was the CTO and cofounder of He has been a speaker at conferences in many countries including the US, Canada, Argentina, Colombia and Uruguay, and he also was a TedX Speaker in 2010 in Argentina.

Twitter account: @BDataScientist