Machine learning approaches rely on simulating large, multilayered webs of virtual neurons. These structures have shown remarkable promise in being able to recognize abstract patterns, leading to breakthroughs in AI problems such as speech recognition, image recognition, language translation, and more.
While the construction and architecture of these neural networks are crucial to this approach, another vital component is data. Data has been called the “oxygen” of the machine intelligence revolution. The vast array of interconnected artificial neurons remain ambivalent to input until trained on often vast sets of data. Companies such as Google, Facebook, and Baidu have taken advantage of their massive collections of data to create a significant competitive advantage in the machine intelligence space. The resultant algorithms provide best-in-class applications and solutions for end users and customers. Thus, what has emerged is a virtuous cycle for these companies: more data leads to better algorithms, which are used to deliver better applications, bringing more users who ultimately feed more data back into the system. For these large internet companies, this serves to strengthen their competitive advantage in AI.
For small companies, the landscape appears very different. A growing “data wall” prevents smaller entities from taking advantage of the virtuous cycle of the data-user feedback loop. Without the data to validate and improve their approaches, small companies find it hard to compete, and there is a risk that the fruits of the machine intelligence revolution will be enjoyed only by the largest of the data-driven tech giants.
While initiatives are beginning to address the idea of making data accessible (either through dataset vendors or data marketplaces), another approach—utilizing synthetic datasets to train models that can then be deployed in the real world—shows considerable promise. The synthetic data approach circumvents the data wall and allows smaller entities to create remarkably accurate models while providing additional advantages in terms of ownership and privacy issues.
Cormac Brick and Xiaofan Xu outline some current approaches to training on synthetic data, including two projects under development at Movidius (now an Intel company), and conclude by sharing the limitations of the synthetic data approach, opportunities to hybridize between both approaches, and some future opportunities to explore as both approaches develop.
Xiaofan Xu is a research engineer at Intel specializing in artificial intelligence and robotics. Previously, Xiaofan worked in the CTO office at Movidius on various research projects, including 3D volumetric object recognition using convolutional neural networks and training neural networks using synthetic data.
Cormac Brick is director of machine intelligence in the Movidius Group at Intel Corporation, where he builds new foundational algorithms for computer vision and machine intelligence to enhance the Myriad VPU product family. Cormac contributes to internal architecture and helps customers build products using the very latest techniques in deep learning and embedded vision through a set of advanced applications and libraries. He has worked with Movidius since its early days and has contributed heavily to the design of the ISA and the hardware systems as well as computer vision software development and tools. Cormac holds a BEng in electronic engineering from University College Cork.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org