Sometime in December 1975, a Kodak engineer named Steven Sasson captured the first digital photograph. Sasson’s prototype camera took 23 seconds to record an image onto a cassette tape. The picture was black and white and only 100 by 100 pixels in size. In the nearly forty years since that day, digital camera resolution has improved by approximately an order of magnitude per decade, a curve of improvement reminiscent of the legendary Moore’s Law that predicted the geometric rate of increase in affordable computational power that has so changed our contemporary world. Call it the More Pixels Law.
This rise in capture capacity has also been accompanied by an era of major breakthroughs in computer vision allowing us to extract ever more information from images. The result is that the camera has become the essential sensor of the contemporary era. Cameras can track human motion, detect objects, identify faces, read distant text, reconstruct a scene in 3D, and even take a pulse. Moreover, the rise of smartphones has created a widely distributed platform that’s perfect for building and distributing computer vision applications. Smartphones cameras are ubiquitous, constantly upgraded, and already connected to sophisticated software platforms that can gain new capacities with the invention of each new computer vision technique. They’re the only sensor that develops whole new capactities after they’ve been deployed simply by writing new software.
In this session, I’ll provide a survey of the current state of computer vision geared to understanding the technological and business possibilities. I’ll also introduce the basic concepts needed to get started building your first computer vision applications, including introducing some of the best prototyping tools and learning environments. If you’re a designer or decision maker, you’ll leave knowing what’s possible and how it could effect your business. If you’re a technologist, of whatever experience, you’ll leave with the tools to get started in this important new domain.
Greg Borenstein is an artist, technologist, and teacher. He creates illusions for humans and machines. His work explores computer vision, machine learning, game design, visual effects, and drawing as media for storytelling and design.
Greg is a graduate of the NYU Interactive Telecommunications Program and has worked for firms such as Makerbot and Berg London. He is the author of a book for O’Reilly about the Microsoft Kinect, titled: Making Things See: 3D vision with Kinect, Processing, Arduino, and MakerBot.
He’s also the author of OpenCV for Processing, a creative coding computer vision library and is currently at work on Getting Started with Computer Vision, an interactive book for O’Reilly introducing computer vision to a wider audience.
He’s currently a researcher in the Playful Systems Group at the MIT Media Lab.