Large online fashion retailers must efficiently maintain catalogues of millions of items. Due to human error, it’s not unusual that some items have duplicate entries. Since manually trawling such a large catalogue is next to impossible, how can you find these entries?
You might take a snapshot of a newly arrived item with your phone and have an algorithm automatically check if such an item is already registered, based on its visual appearance. However, when applying content-based image retrieval, it’s highly likely that the performance will be hindered by the difference of the visual content in the images, such as the busy background of a mobile image versus a clean studio image, not to mention inconsistent folding or creases, lighting, scale and point-of-view angle. To increase the success rate, it’s prudent to remove the background of the query image before applying any retrieval algorithms.
Patty Ryan, CY Yam, and Elena Terenzi explain how they developed a specialized segmentation model for background removal or garment (foreground) segmentation using one of the most recent deep learning architectures, Tiramisu. The solution achieved a remarkable segmentation accuracy of 94% with 200 training images and has been proved to significantly improve content-based
image retrieval performance.
Patty, CY, and Elena begin by discussing GrabCut, a very successful foreground segmentation method, and explain how it is being used to create labeled data. They then offer an overview of their deep learning-based specialized segmentation tool Tiramisu and show where the model performs well and where its performance is less satisfactory. Patty, CY, and Elena conclude with a demonstration of how this tool can be applied to help to prevent the issue of duplicate entries in a very large online fashion retailer catalogue.
Patty Ryan is an applied data scientist at Microsoft, where she codes with the company’s partners and customers to tackle tough problems using machine learning approaches with sensor, text, and vision data. She’s a graduate of the University of Michigan.
CY Yam is a data scientist at Microsoft, where she applies machine learning techniques to solving various problems in daily life. Previously, CY invented new ways to recognize people by the way they move.
Elena Terenzi is a software development engineer at Microsoft, where she brings business intelligence solutions to Microsoft Enterprise customers and advocates for business analytics and big data solutions for the manufacturing sector in Western Europe, such as helping big automotive customers implement telemetry analytics solutions with IoT flavor in their enterprises. She started her career with data as a database administrator and data analyst for an investment bank in Italy. Elena holds a master’s degree in AI and NLP from the University of Illinois at Chicago.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com