The 2016 Yelp Restaurant Photo Classification challenge on Kaggle attracted a lot of attention in the data science community. The challenge consisted of classifying pictures of restaurants in several categories, such as good ambiance, services offered, and friendliness.
One of the many challenges was the low number of images per category. This restriction limited the use of many state-of-the-art image classification techniques like pure deep learning. However, this resulted in a number of ingenious solutions using ensemble methods in combination with deep learning models to achieve a high classification score.
Natalino Busa shares the implementation based on Spark and Slider. Spark processes data and trains the ML model, which consists of deep learning and ensemble classification methods, while picture scoring is exposed via an API that is persisted and scaled with Slider. Join in to see how it all works (and get a glimpse of some truly tasty pictures).
Natalino Busa is the chief data architect at DBS, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Natalino is an all-around technology manager, product developer, and innovator with a 15+-year track record in research, development, and management of distributed architectures and scalable services and applications. Previously, he was the head of data science at Teradata, an enterprise data architect at ING, and a senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers.
©2017, O’Reilly UK Ltd • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com