The concept of information bottlenecks is fundamentally important when considering many deep learning architectures. While in some circumstances, such as in word vectors, the compression they enforce can be useful, for the majority of tasks, they simply result in lost accuracy. These information bottlenecks can be alleviated by adding memory into neural networks and allowing dynamic attention over those memories. Such techniques have resulted in state-of-the-art developments for question-answering tasks, machine translation, and even challenging computational geometry problems.
Stephen Merity discusses the most recent techniques, what tasks they show the most promise in, when they make sense in deep learning architectures, and the underlying reasons they excel on a variety of tasks. Along the way, Stephen examines the computational costs that memory and attention mechanisms add and how they may be avoided for production systems.
Stephen Merity is a senior research scientist at MetaMind, part of Salesforce Research, where he works on researching and implementing deep learning models for vision and text, with a focus on memory networks and neural attention mechanisms for computer vision and natural language processing tasks. Previously, Stephen worked on big data at Common Crawl, data analytics at Freelancer.com, and online education at Grok Learning. Stephen holds a master’s degree in computational science and engineering from Harvard University and a bachelor of information technology from the University of Sydney.
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.