Motivated by making technology more accessible, Anirudh Koul and Saqib Shaik explain how deep learning can enrich image understanding that can, in turn, enable the blind community to experience and interact with the physical world in a more holistic manner than has ever been possible before. The intersection of vision and language is a ripe area of research and, fueled by advances in deep learning, is the future of computer vision.
Anirudh and Saqib explore how computer vision has evolved through history and outline cutting-edge research in this area, especially in the areas of image captioning. Going beyond object classification, they attempt to understand objects in context, as well as their relationships, and describe them in a sentence. Drawing on the winning entry at the Microsoft COCO Captioning Challenge 2015, Anirudh and Saqib look at how developers can utilize these state-of-the-art techniques in their own projects. For example, it is now possible to generate very detailed descriptions of images such as “I see a young man on a sofa reading a book” or “I see people jogging at the beach.” This powerful research can be extremely useful to the blind and is beneficial to businesses that rely on image searches as well. Anirudh and Saqib also briefly cover Microsoft’s Project Oxford, the set of machine-learning APIs for vision, speech, and facial recognition, whose APIs are now open to use. This makes it straightforward for developers to integrate state-of-the-art image understanding into their own applications. With the help of one of their key developers (who also happens to be blind), Anirudh and Saqib then demo a practical application of these techniques to showcase the transformation that this technology can bring into someone’s daily routine.
Anirudh Koul is a senior data scientist at Microsoft AI and Research. An entrepreneur at heart, he has been running a mini-startup team within Microsoft, prototyping ideas using computer vision and deep learning techniques for augmented reality, productivity, and accessibility, building tools for communities with visual, hearing, and mobility impairments. Anirudh brings a decade of production-oriented applied research experience on petabyte-scale social media datasets, including Facebook, Twitter, Yahoo Answers, Quora, Foursquare, and Bing. A regular at hackathons, he has won close to three dozen awards, including top-three finishes for three years consecutively in the world’s largest private hackathon, with 16,000 participants. Some of his recent work, which IEEE has called “life changing,” has been showcased at a White House AI event, Netflix, and National Geographic and to the Prime Ministers of Canada and Singapore.
Saqib Shaikh is a software engineer at Microsoft, where he has worked for 10 years. Saqib has developed a variety of Internet-scale services and data pipelines powering Bing, Cortana, Edge, MSN, and various mobile apps. Being blind, Saqib is passionate about accessibility and universal design; he serves as an internal consultant for teams including Windows, Office, Skype, and Visual Studio and has spoken at several international conferences. Saqib has won three Microsoft hackathons in the past year. His current interests focus on the intersection between AI and HCI and the application of technology for social good.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.