Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

The frontiers of attention and memory in neural networks

Stephen Merity (Salesforce Research)
5:10pm5:50pm Wednesday, March 15, 2017
Data science & advanced analytics
Location: 210 C/G Level: Intermediate
Secondary topics:  Deep learning, Hardcore Data Science
Average rating: ****.
(4.67, 3 ratings)

Who is this presentation for?

  • Data scientists, machine-learning researchers and engineers, and managers of machine-learning teams

Prerequisite knowledge

  • A general understanding of neural networks
  • Familiarity with recurrent neural networks

What you'll learn

  • Explore the use of attention and memory in neural networks, including the most recent methods from the academic community (e.g., attention in neural machine translation, pointer networks, and hierarchical attentive memory) and in Stephen Merity's own work (dynamic memory networks for visual and textual question answering, pointer sentinel mixture models)
  • Discover which tasks benefit most from deep learning architectures augmented with attention and memory methods
  • Understand the additional challenges that attention and memory methods introduce, including during training and at prediction time, and how they might be avoided


The concept of information bottlenecks is fundamentally important when considering many deep learning architectures. While in some circumstances, such as in word vectors, the compression they enforce can be useful, for the majority of tasks, they simply result in lost accuracy. These information bottlenecks can be alleviated by adding memory into neural networks and allowing dynamic attention over those memories. Such techniques have resulted in state-of-the-art developments for question-answering tasks, machine translation, and even challenging computational geometry problems.

Stephen Merity discusses the most recent techniques, what tasks they show the most promise in, when they make sense in deep learning architectures, and the underlying reasons they excel on a variety of tasks. Along the way, Stephen examines the computational costs that memory and attention mechanisms add and how they may be avoided for production systems.

Photo of Stephen Merity

Stephen Merity

Salesforce Research

Stephen Merity is a senior research scientist at Salesforce Research (formerly MetaMind), where he works on researching and implementing deep learning models for vision and text, with a focus on memory networks and neural attention mechanisms for computer vision and natural language processing tasks. Previously, Stephen worked on big data at Common Crawl, data analytics at, and online education at Grok Learning. Stephen holds a master’s degree in computational science and engineering from Harvard University and a bachelor of information technology from the University of Sydney.