Presented By O'Reilly and Cloudera
Make Data Work
Feb 17–20, 2015 • San Jose, CA

Behavior-driven Machine Translation

Irina Borisova (Chegg), Asim Mathur (eBay)
10:40am–11:20am Friday, 02/20/2015
Data Science
Location: LL20 A
Average rating: ****.
(4.50, 6 ratings)
Slides:   1-PPTX 

Language barrier is one of the key obstacles in cross-border trade. At eBay, we address this problem by building our own machine translation (MT) system, one of the first of its kind in the e-commerce domain. Building a seamless user site experience, where MT plays a pivotal role, requires leveraging huge amounts of transactional and behavioral data for development and evaluation of our MT systems, adapting evaluation metrics to reflect the eBay buyer experience and measuring translation quality and impact on the shopping experience of our international users.

Modern MT technology relies on the quality and quantity of the language data used for development. In the first part of this talk we will discuss how we select data to represent hundreds of millions of products sold on eBay and the diverse language that describes them. Using the user clickstream information collected on a daily basis, we choose most relevant and popular data and ensure a diverse sample to extend our language coverage as much as possible.

Traditionally, MT uses several standard automatic metrics for quality evaluation. In the second part of our talk, we will discuss novel MT quality and impact evaluation metrics used at eBay. In the pre-launch evaluation phase, we use both automated metrics (e.g. search recall for a query translated with an MT model) and human judgment to get a better understanding of translation quality. Post launch, we closely monitor user behavior on the site, wherever MT comes into the picture, and discover particular markers and trend lines that help us understand MT impact in specific product categories. We will also share our experience analyzing direct user feedback ratings of translations available through the “hover-over” feature on the site and user perception of MT collected via user surveys.

Join this talk to learn about our practices of utilizing big data to build a high-quality and high-coverage language technology that keeps our customer experience at the forefront.

Irina Borisova

Chegg

Irina is an Applied Researcher at eBay, working on machine translation training and evaluation. Irina holds two Masters degrees in natural language processing and neurolinguistics.

Asim Mathur

eBay

Asim is a Senior Data Engineer in Machine Translation team at eBay. He has been at eBay for 7 years working in Data Analytics, Data Engineering and Business Intelligence.