The glitching and buzzing artificial voices that are synonymous with TTS have been a problem for over five decades, and it’s an issue that’s becoming increasingly prominent with the number of smartphones and smart speakers in use. WaveNet is real, and the end product is clearly far more advanced than its rivals.
Peter Cahill explains why Wavenet will be the next generation of recognition, synthesis, and voice-activity detection as he takes you through the history of speech synthesis and recognition, details the breakthroughs that have been made in TTS, and demonstrates how to take advantage of these advances in speech and language technology.
Peter Cahill is the founder and CEO of Voysis. He has over 15 years’ experience in speech technology and neural network R&D. Previously, Peter was part of a group of scientists that attracted a total of $117M funding for ADAPT (formerly CNGL), a dynamic research center that combines leading academic researchers with key industry partners to produce groundbreaking digital content innovations. Peter is an active member of the speech research community; he chairs SynSIG, the global speech synthesis special interest group, serves as a reviewer for all leading journals and conferences in his field. He holds a PhD from University College Dublin, where he was also a faculty member.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org