Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Classifying restaurant pictures: An API with Spark and Slider

17:2518:05 Wednesday, 24 May 2017
Level: Intermediate

Who is this presentation for?

  • Data scientists, data engineers, and those working with Hadoop, Spark, or Slider

Prerequisite knowledge

  • Basic familiarity with machine-learning models

What you'll learn

  • Explore how ensemble models can be combined with deep learning in Spark and a permanent and scalable API in Spark using Slider


The 2016 Yelp Restaurant Photo Classification challenge on Kaggle attracted a lot of attention in the data science community. The challenge consisted of classifying pictures of restaurants in several categories, such as good ambiance, services offered, and friendliness.

One of the many challenges was the low number of images per category. This restriction limited the use of many state-of-the-art image classification techniques like pure deep learning. However, this resulted in a number of ingenious solutions using ensemble methods in combination with deep learning models to achieve a high classification score.

Natalino Busa shares the implementation based on Spark and Slider. Spark processes data and trains the ML model, which consists of deep learning and ensemble classification methods, while picture scoring is exposed via an API that is persisted and scaled with Slider. Join in to see how it all works (and get a glimpse of some truly tasty pictures).

Photo of Natalino Busa

Natalino Busa


Natalino Busa is the chief data architect at DBS, where he leads the definition, design, and implementation of big, fast data solutions for data-driven applications, such as predictive analytics, personalized marketing, and security event monitoring. Natalino is an all-around technology manager, product developer, and innovator with a 15+-year track record in research, development, and management of distributed architectures and scalable services and applications. Previously, he was the head of data science at Teradata, an enterprise data architect at ING, and a senior researcher at Philips Research Laboratories on the topics of system-on-a-chip architectures, distributed computing, and parallelizing compilers.