Motivated by making technology more accessible, Anirudh Koul and Saqib Shaik explain how deep learning can enrich image understanding that can, in turn, enable the blind community to experience and interact with the physical world in a more holistic manner than has ever been possible before. The intersection of vision and language is a ripe area of research and, fueled by advances in deep learning, is the future of computer vision.
Anirudh and Saqib explore how computer vision has evolved through history and outline cutting-edge research in this area, especially in the areas of image captioning. Going beyond object classification, they attempt to understand objects in context, as well as their relationships, and describe them in a sentence. Drawing on the winning entry at the Microsoft COCO Captioning Challenge 2015, Anirudh and Saqib look at how developers can utilize these state-of-the-art techniques in their own projects. For example, it is now possible to generate very detailed descriptions of images such as “I see a young man on a sofa reading a book” or “I see people jogging at the beach.” This powerful research can be extremely useful to the blind and is beneficial to businesses that rely on image searches as well. Anirudh and Saqib also briefly cover Microsoft’s Project Oxford, the set of machine-learning APIs for vision, speech, and facial recognition, whose APIs are now open to use. This makes it straightforward for developers to integrate state-of-the-art image understanding into their own applications. With the help of one of their key developers (who also happens to be blind), Anirudh and Saqib then demo a practical application of these techniques to showcase the transformation that this technology can bring into someone’s daily routine.
Anirudh Koul is a head of AI and research at Aira, noted by Time magazine as one of the best inventions of 2018. He’s a noted AI expert and O’Reilly author, including the upcoming Practical Deep Learning for Cloud and Mobile. Previously, he was a scientist at Microsoft AI, where he founded Seeing AI, the most-used technology among the blind community after the iPhone. With features shipped to a billion users, he brings over a decade of production-oriented applied research experience on petabyte-scale datasets. He’s been developing technologies using AI techniques for augmented reality, robotics, speech, productivity, and accessibility. Some of his recent work, which IEEE has called “life-changing,” has been honored by CES, FCC, Cannes Lions, American Council of the Blind, showcased at events by the UN, the White House, the House of Lords, the World Economic Forum, Netflix, National Geographic, and applauded by world leaders including Justin Trudeau and Theresa May.
Saqib Shaikh is a software engineer at Microsoft, where he has worked for 10 years. Saqib has developed a variety of Internet-scale services and data pipelines powering Bing, Cortana, Edge, MSN, and various mobile apps. Being blind, Saqib is passionate about accessibility and universal design; he serves as an internal consultant for teams including Windows, Office, Skype, and Visual Studio and has spoken at several international conferences. Saqib has won three Microsoft hackathons in the past year. His current interests focus on the intersection between AI and HCI and the application of technology for social good.
©2016, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.