Deep learning has become the de facto standard method in most computer vision problems since its breakthrough in the 2012 ImageNet Challenge. In the past few years, with much more complicated and deeper neural network architectures, deep learning algorithms have met and exceeded human-level performance in image recognition. Increasingly computer vision applications are starting to apply deep learning technologies, and plenty of them are achieving great success. Nevertheless, training deep learning networks on a large dataset remains very challenging. The sheer amount of computation needed, for instance, to train a convolutional neural network on the Sport 1M dataset, can take months, so the community desperately needs tools to help train deep learning networks on multiple servers with multiple GPUs.
Anusua Trivedi offers an overview of Microsoft’s Cognitive Toolkit, also known as CNTK. CNTK was originally designed for speech processing tasks, and it was released under a relatively restrictive license on Codeplex in April 2015. In February 2016, CNTK was moved to GitHub with a much friendlier MIT License. In November 2016, CNTK 2.0 was released, containing both C++ and Python APIs. The Cognitive Toolkit is cross-platform, and it runs on both Windows and Linux with no performance trade-offs.
There are a large number of deep learning toolkits used in the vision community, including Caffe, Torch, Theano, TensorFlow, MxNet, etc. CNTK has unique advantages over other toolkits, especially in speed and scalability. It was key to Microsoft Research’s recent breakthrough in speech recognition by reaching human parity in conversational speech recognition and has been extensively used internally at Microsoft for image, text, and speech data, with each area benefiting from the built-in scalability. Anusua compares five well-known toolkits to demonstrate how CNTK achieves almost linear scalability, which is far superior to all the other well-known toolkits. CNTK achieves such scalability via advanced algorithms such as 1-bit SGD and block-momentum SGD, which Anusua explains in detail.
This tutorial is sponsored by Microsoft.
Anusua Trivedi is a data scientist on Microsoft’s advanced data science and strategic initiatives team, where she works on developing advanced predictive analytics and deep learning models. Previously, Anusua was a data scientist at the Texas Advanced Computing Center (TACC), a supercomputer center, where she developed algorithms and methods for the supercomputer to explore, analyze, and visualize clinical and biological big data. Anusua is a frequent speaker at machine learning and big data conferences across the United States, including Supercomputing 2015 (SC15), PyData Seattle 2015, and MLconf Atlanta 2015. Anusua has also held positions with UT Austin and University of Utah.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org