In recent years, dramatic progress has been made in the field of computer vision using deep neural network (DNN) technology. DNN models can now be trained on tens of millions of images to reliably recognize thousands of different classes of images. Microsoft has been a leading force in the advancement of this technology, with its development of the ResNet modeling technique, which won the 2015 ImageNet Object Detection competition. While these state-of-the-art research results are impressive, an even more valuable aspect of these DNN models is the ease with which they can be adapted to new use cases without requiring extensive, computation-heavy retraining.
Timothy Hazen offers an overview of Microsoft’s DNN technology for computer vision, describing both how the technology works and how Microsoft is making this technology available for outside users to build their own custom computer vision solutions. Timothy begins with an overview of Microsoft’s deep learning techniques for image classification and object detection and then explores the collection of technologies that Microsoft has made available for its users to build their own custom solutions, specifically focusing on the Microsoft Cognitive Toolkit and the Microsoft Image Recognition Intelligent Service (IRIS).
The Microsoft Cognitive Toolkit is a free, easy-to-use, open source, commercial-grade toolkit for training deep learning algorithms. The toolkit provides the ability for users to not only train image processing models from scratch but also adapt existing pretrained state-of-the-art models to new use cases using their own data. Using this approach, high-quality models can be created for custom use cases using only fractions of the amount of data used to train our large-scale ImageNet models. Timothy demonstrates how this Cognitive Toolkit capability can be used to quickly build a custom computer vision solution for recognizing and localizing specific food products in a refrigerator.
To simplify the process of building custom computer vision models even further, Microsoft has developed IRIS to provide users with a simple web service that enables users to upload image datasets, define image classification classes, and annotate data. From the user’s data, IRIS then automatically builds a new deep learned model and deploys the model within a web service that can automatically annotate new images based on the user-defined annotation schema. Timothy shows how IRIS can be used to quickly build and deploy a model for recognizing animals species in an image, specifically to enable the automatic monitoring of endangered animals in the wild using motion capture cameras.
Timothy J. Hazen is a principal data science manager in Microsoft’s Cloud and Enterprise Data group, where he leads a data science team in the development of customer-facing machine learning capabilities for the Microsoft Azure platform, primarily in the areas of image processing and natural language processing. Timothy has also developed natural language technology used within Microsoft’s Bing and Cortana products. Previously, he spent six years as a member of the Human Language Technology group at MIT’s Lincoln Laboratory and nine years as a research scientist at the MIT Computer Science and Artificial Intelligence Laboratory. Timothy holds an SB, SM, and PhD in electrical engineering and computer science from the Massachusetts Institute of Technology.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com