Speech recognition with OpenSeq2Seq
Who is this presentation for?
- Researchers, engineers, and data scientists
Automatic speech recognition (ASR) is a core technology to create convenient human-computer interfaces. But building ASR systems with competitive word error rate (WER) traditionally required specialized expertise, large labeled datasets, and complex approaches.
Jason Li and Vitaly Lavrukhin dive into how end-to-end models simplified speech recognition and present Jasper, an end-to-end convolutional neural acoustic model, which yields state-of-the-art WER on LibriSpeech, an open dataset for speech recognition. They explore its implementation in the TensorFlow-based OpenSen2Seq toolkit and how to use it to solve large vocabulary speech recognition and speech command recognition problems. OpenSeq2Seq is an open source deep learning toolkit. They provide pretrained models for out-of-the-box experimentation.
- A basic understanding of deep learning and convolutional neural networks
What you'll learn
- Discover end-to-end speech recognition and the OpenSeq2Seq deep learning toolkit
Jason (Jing Yao) Li is a deep learning software engineer on the AI applications team at NVIDIA. He earned his BASc and MScAC at the University of Toronto working with Roger Grosse and Jimmy Ba. His research focus is on sequence-to-sequence models and speech, specifically in the domains of speech synthesis and speech recognition.
Vitaly Lavrukhin is a senior applied research scientist at NVIDIA, working on deep learning algorithms for speech and language technologies. Previously, he conducted research to solve computer vision problems with deep learning methods at Samsung R&D Institute Russia.
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires