Motivated by making technology more accessible, Anirudh Koul and Saqib Shaik explain how deep learning can enrich image understanding that can, in turn, enable the blind community to experience and interact with the physical world in a more holistic manner than has ever been possible before. The intersection of vision and language is a ripe area of research and, fueled by advances in deep learning, is the future of computer vision.
Anirudh and Saqib explore how computer vision has evolved through history and outline cutting-edge research in this area, especially in the areas of image captioning. Going beyond object classification, they attempt to understand objects in context, as well as their relationships, and describe them in a sentence. Drawing on the winning entry at the Microsoft COCO Captioning Challenge 2015, Anirudh and Saqib look at how developers can utilize these state-of-the-art techniques in their own projects. For example, it is now possible to generate very detailed descriptions of images such as “I see a young man on a sofa reading a book” or “I see people jogging at the beach.” This powerful research can be extremely useful to the blind and is beneficial to businesses that rely on image searches as well. Anirudh and Saqib also briefly cover Microsoft’s Project Oxford, the set of machine-learning APIs for vision, speech, and facial recognition, whose APIs are now open to use. This makes it straightforward for developers to integrate state-of-the-art image understanding into their own applications. With the help of one of their key developers (who also happens to be blind), Anirudh and Saqib then demo a practical application of these techniques to showcase the transformation that this technology can bring into someone’s daily routine.
Anirudh Koul is a data scientist at Microsoft. Anirudh brings a decade of applied research experience on petabyte-scale social media datasets, including Facebook, Twitter, Yahoo Answers, Quora, Foursquare, and Bing. He has worked on a variety of machine-learning, natural language processing, and information retrieval-related projects at Yahoo, Microsoft, and Carnegie Mellon University. Adept at rapidly prototyping ideas, Anirudh has won over two dozen innovation, programming, and 24-hour hackathons organized by companies including Facebook, Google, Microsoft, IBM, and Yahoo. He was also the keynote speaker at the 2014 SMX conference in Munich, where he spoke about trends in applying machine learning on big data.
Saqib Shaikh is a software engineer at Microsoft, where he has worked for 10 years. Saqib has developed a variety of Internet-scale services and data pipelines powering Bing, Cortana, Edge, MSN, and various mobile apps. Being blind, Saqib is passionate about accessibility and universal design; he serves as an internal consultant for teams including Windows, Office, Skype, and Visual Studio and has spoken at several international conferences. Saqib has won three Microsoft hackathons in the past year. His current interests focus on the intersection between AI and HCI and the application of technology for social good.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.