Synthetic data promises massive sets of perfectly generated training data for a fraction of the cost of manually sourced annotated data. But doubt remains about the efficacy of using synthetic datasets to train machine learning.
Daeil Kim delineates the advantages of synthetic data and explains how to avoid traps that lead to dead zones and false positives. He also reviews work on simulations for synthetic data in application verticals in which it is traditionally difficult to manually acquire significant datasets. If you have problems with sparse datasets for training, this is the talk for you.
Driven by the passion to create a better world with AI, Daeil Kim created AI.Reverie, a simulation platform to train AI to understand the world and make it better. Daeil believes that we can create a future where issues related to food, shelter, and health can be efficiently met with the help of AI. Daeil grew up in New York City. He holds a liberal arts degree at Sarah Lawrence College, focusing on literature. An interest in medicine led him to New Mexico to research schizophrenia and to understand mental illness through artificial intelligence. He then pursued a PhD in computer science at Brown University, focusing on the development of scalable machine learning algorithms. Afterward, his interests in developing tools for investigative journalism led him to pursue a career as a data scientist at the New York Times.
©2018, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com