Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Beyond DNNs towards New Architectures for Deep Learning, with Applications to Large Vocabulary Continuous Speech Recognition

Tara Sainath (Google)
9:05am–9:45am Wednesday, 02/18/2015
Hardcore Data Science
Location: LL20 BC.
Average rating: ****.
(4.50, 4 ratings)

In the past few years, we have seen a paradigm shift in the speech recognition community towards using deep neural networks (DNNs). DNNs were first explored for acoustic modeling, where numerous research labs demonstrated improvements in WER between 10-40% relative. In this talk, I will provide an overview of the latest improvements in deep learning across various research labs since the initial inception. This includes alternative neural network architectures, including Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNNs), which have yielded additional gains over DNNs for acoustic modeling. In addition, I’ll discuss how deep learning can be used in other parts of the recognition process, from feature learning to better modeling of classes we want to predict.

Photo of Tara Sainath

Tara Sainath


Tara Sainath received her PhD in Electrical Engineering and Computer Science from MIT in 2009. The main focus of her PhD work was in acoustic modeling for noise robust speech recognition. After her PhD, she spent 5 years at the Speech and Language Algorithms group at IBM T.J. Watson Research Center, before joining Google Research. She has co-organized a special session on Sparse Representations at Interspeech 2010 in Japan. She has also organized a special session on Deep Learning at ICML 2013 in Atlanta. In addition, she is a staff reporter for the IEEE Speech and Language Processing Technical Committee (SLTC) Newsletter. Her research interests are mainly in acoustic modeling, including deep neural networks and sparse representations.