Using deep learning models to extract the most value from 360-degree images

Shourabh Rawat (Zillow)

4:00pm–4:40pm Thursday, September 12, 2019

Location: 230 A

Models and Methods

Secondary topics: Computer Vision, Deep Learning, Machine Learning

Average rating:

(4.00, 2 ratings)

Level

Intermediate

Recent camera advances enabling automatic panorama generation have made 360-degree images ubiquitous in industries ranging from real estate to ecommerce and travel. These panoramic views enable an immersive experience that benefits consumers. But 360-degree images can create a challenge for businesses to direct viewers to the most important parts of the scene. Trulia’s parent company, Zillow Group, uses this technology to create 3-D home views that allow users to see a complete view of a room and find the perfect home. The wide field of view created by panoramas means that businesses must ensure viewers see the most engaging part of the image first. This need becomes paramount when panoramas need to be represented as static 2-D images. The key here is to identify a salient thumbnail specifically chosen to give the most informative view of each panorama to help drive engagement.

Shourabh Rawat explores how to use and train saliency score models, deep learning techniques, and algorithms to identify and extract the most visually informative and pleasing viewpoints to create this salient thumbnail. In order to compute a saliency score, Trulia relies on three different deep convolutional neural networks: the scene model helps capture the representativeness of a viewpoint to ensure the most relevant photos are chosen for a real estate listing (i.e., a kitchen or living room versus a blank wall or window); the attractiveness model penalizes low visual quality such as blurry or dark photos and rewards aesthetically pleasing photos with a high score and trains a deep learning model to label properties as either luxury or fixer-upper because home location and listing price tends to affect the photo quality as well; and the appropriateness model helps differentiate between relevant viewpoints like views of a bedroom from irrelevant views like walls or humans.

Prerequisite knowledge

A basic understanding of deep learning algorithms and models

What you'll learn

Learn to create a saliency model that defines criteria for salient thumbnails, ensuring they are representative, attractive, and diverse
Discover how to extract salient thumbnails while maintaining important aspects like specific field of view, 3-D orientation, aspect ratio, and viewport size; create an algorithm to rank all potential thumbnails extracted from the panorama based on a defined saliency criteria using scene, attractiveness, and appropriateness; and deploy these images within your organization’s practices

Shourabh Rawat

Zillow

Shourabh Rawat is a senior engineering manager of applied sciences at Zillow. He has over 5 years of industry experience working in AI, deep learning, computer vision, and personalization, deploying these systems to production at scale. Shourabh and his team focus on developing data science solutions to gain a better understanding of Zillow’s customers, specifically how they engage with content and property recommendations. Shourabh completed his master’s degree from Carnegie Mellon University where he did research on event detection in consumer videos, applying deep learning on multimodal (audio and images) data.