Presented By O'Reilly and Cloudera
Make Data Work
March 28–29, 2016: Training
March 29–31, 2016: Conference
San Jose, CA

Dynamic memory networks for visual and textual question answering

Stephen Merity (Salesforce Research), Caiming Xiong (Metamind)
12:00pm–12:30pm Tuesday, 03/29/2016
Hardcore Data Science
Location: 210 C/G
Average rating: ****.
(4.15, 13 ratings)

Prerequisite knowledge

Attendees should have a general understanding of neural networks as well as familiarity with the purpose of convolutional neural networks and recurrent neural networks.


Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. Stephen Merity, Richard Socher, and Caiming Xiong offer an overview of their work with the dynamic memory network (DMN), which uses both of these mechanisms to achieve state-of-the-art performance on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset. Stephen, Richard, and Caiming demonstrate how attention mechanisms allow for improved inspection of the deep learning models, helping to understand the evidence behind specific decisions. The techniques they discuss are applicable to a wide range of tasks, improving both the accuracy and interpretability of the resulting models.

Photo of Stephen Merity

Stephen Merity

Salesforce Research

Stephen Merity is a senior research scientist at Salesforce Research (formerly MetaMind), where he works on researching and implementing deep learning models for vision and text, with a focus on memory networks and neural attention mechanisms for computer vision and natural language processing tasks. Previously, Stephen worked on big data at Common Crawl, data analytics at, and online education at Grok Learning. Stephen holds a master’s degree in computational science and engineering from Harvard University and a bachelor of information technology from the University of Sydney.

Photo of Caiming Xiong

Caiming Xiong


Caiming Xiong is a senior researcher at Metamind. Before that, he was a postdoctoral researcher in the Department of Statistics at the University of California, Los Angeles. Caiming holds a PhD in computer science and engineering from SUNY Buffalo and a BS and MS from Huazhong University of Science and Technology. His research interests include deep learning, computer vision, and human-robot interaction.