Skip to main content
Make Data Work
Oct 15–17, 2014 • New York, NY

Statistical Topic Modeling

Hanna Wallach (Microsoft Research NYC & University of Massachusetts Amherst)
4:00pm–4:30pm Wednesday, 10/15/2014
Hardcore Data Science
Location: E14 / E15
Average rating: ***..
(3.60, 5 ratings)

In this talk, Hanna will give an introduction to statistical topic modeling, a state-of-the-art machine learning framework for analyzing massive document collections. Statistical topic models automatically infer groups of semantically related words (topics) from word co-occurrence patterns in documents without human intervention. Automated topic inference of this sort is extremely useful for characterizing the semantic content of document collections so large that manual human judgment is cost-prohibitive. The inferred topics can be used to aid a variety of exploratory and predictive tasks such as detecting emergent areas of innovation, tracking topic trends across languages, and identifying thematic collaborative communities.

Hanna will provide an overview of the mathematical and computational ideas that underlie this class of models, and discuss several recently developed models and inference algorithms. Finally, she will present a variety of case studies in applying statistical topic models to real-world document collections, ranging from formerly classified government documents to email communication networks, in order to showcase their power and flexibility as reliable analysis tools for data science practitioners and decision-makers.

Hanna Wallach

Microsoft Research NYC & University of Massachusetts Amherst

Hanna Wallach is a researcher at Microsoft Research in New York City
and an assistant professor at the University of Massachusetts
Amherst’s School of Computer Science, where she is one of five core
faculty members involved in UMass’s recently formed Computational
Social Science Initiative. Hanna develops new machine learning methods
for analyzing the structure, content, and dynamics of complex social
processes, such as the US political system, the US patent system, and
software development communities. Her research contributes to machine
learning, Bayesian statistics, and, in collaboration with social
scientists, to the nascent field of computational social science. Her
work on infinite belief networks won the best paper award at AISTATS
2010. Hanna has organized several workshops on Bayesian latent
variable modeling and computational social science. She also
co-founded the annual Women in Machine Learning Workshop. Hanna holds
a B.A. in Computer Science from the University of Cambridge, an
M.S. in Cognitive Science and Machine Learning from the University of
Edinburgh, and a Ph.D. in Physics from the University of Cambridge.