The world is experiencing an Industrial Revolution of Data. In any
given minute the machines around us are tracking billions of mouse
clicks, credit card swipes, and GPS coordinates. And increasingly
this data is being saved, aggregated, and analyzed. These massive
data flows present big challenges to firms, but also new opportunities
for deriving insights.
A new class of professionals, called data scientists, have
emerged to address the Big Data revolution. In this talk, I
first discuss three core skills to their workflow: munging,
modeling, and visualization. Then I present a case study of
using these skills: the analysis of billions of call records to
address customer churn at a North American telecom.
Munging is the process of transforming large data sets into a form
suitable for analysis; this is often the most labor-intensive of the
three steps. Modeling refers to the application of statistical
learning to identify patterns or make predictions using features of
the data. Data visualization is how these models are presented to
The case study begins with a data set of several billion call records
spread across millions of customers. This data was first munged to
describe frequent calling networks. We next modeled how events
propagated within these and found: customers with a cancellation event
in their network were 700% more likely to terminate service than at
baseline. Finally, we visualized this analysis by showing how
cancellations spread in one metropolitan call network.
Michael Driscoll has a decade of experience developing large-scale databases and predictive algorithms for digital media, financial, and life sciences firms. He is the CEO and co-founder at Metamarkets, and Chairman of Dataspora LLC, a big data & analytics consultancy he founded in 2007. Previously, he founded the online retailer, CustomInk.com, and worked as a software engineer for the Human Genome Project. Michael holds a Ph.D. in Bioinformatics from Boston University and an A.B. from Harvard College.