AI competitions provide a venue for comparing alternative techniques head-to-head and as such, are an important tool for gauging progress in AI. But what ingredients distinguish the top AI systems for a given task from the rest? And what competitions are the most relevant indicators of real progress?
Over the last year, Steve Rennie and his colleagues have significantly advanced the state of the art in performance on two flagship challenges in AI: the Switchboard Evaluation Benchmark for Automatic Speech Recognition and the MSCOCO Image Captioning Challenge. Steve shares the innovations in deep learning research—according to their proven performance on benchmark AI tasks such as Switchboard (ASR), ImageNet and MSCOCO (computer vision), SQUaD (natural language processing), and WMT (machine translation)—that have most advanced performance on these and other benchmark AI tasks, covering specific but broadly applicable innovations, including annealed dropout (AD), a model ensembling trick, and self-critical sequence training (SCST), a simple reinforcement learning (RL) innovation. Beyond simply describing what these innovations are, Steve dives into how and why they work and where they point in terms of directions of future innovation. While some of these innovations are task specific, many can seamlessly be applied to other application domains, including yours.
Steven Rennie is the director of research at Fusemachines, an AI solutions and services company whose mission is to make AI accessible to everyone through education, software, and services. Previously, Steve worked at the IBM TJ Watson Research Center, where he led the Multimodal Group in the Watson Division. He has published over 50 peer-reviewed papers on machine learning and AI applications, including source separation, robust automatic speech recognition (ASR), multitalker speech recognition, LVCSR, graphical models, data-driven computational auditory scene analysis, machine translation, probabilistic array processing, reinforcement learning, and image captioning. He has served as a committee member for a number of leading conferences, including ICLR, AI-STATS, ACL, COLING, SIGGRAPH, INTERSPEECH, ICASSP, and ASRU, TASL, ICML, and NIPS. Steve was recently elected to the IEEE’s prestigious Speech and Language Technology Committee (SLTC) and has advanced the state-of-the-art in performance on several AI challenges, including the Pascal Speech Separation and Recognition Challenge, the Aurora 4 Noise Robust ASR Database, the Switchboard LVCSR Evaluation Benchmark, and most recently, the MSCOCO Image Captioning Challenge. He holds a PhD in electrical and computer engineering from the University of Toronto, with a dissertation titled Graphical Models for Speech Recognition in Adverse Environments. His primary research interest is in developing novel, practical algorithms for information processing that leverage graphical modeling and deep, reinforcement, and adversarial learning techniques.
Comments on this page are now closed.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org