Presented By
O’Reilly + Cloudera
Make Data Work
March 25-28, 2019
San Francisco, CA
Please log in

Building the AI engine for retail in the new era

JIAN CHANG (Alibaba Group), Sanjian Chen (Alibaba Group)
11:00am11:40am Wednesday, March 27, 2019
Average rating: ****.
(4.50, 4 ratings)

Who is this presentation for?

  • Executives, architects, engineers, and analysts



What you'll learn

  • Explore Alibaba's TSDB AI engine architecture
  • Learn data science best practices for retail data analytics and decision making


Global Market Insights forecasts the retail analytics market to surpass US$13 billion by 2024. Traditional retailers are facing great competitive threat from online rivals. As a result, the retail industry is moving toward leveraging deep data analytics and AI to revolutionize their decision-making processes. As a global leader in ecommerce and technology, Alibaba has been driving the emerging trend “new retail,” the core concept of which centers around creating a customer experience by unifying online and offline behavior and data-driven operation. As you might imagine, this model creates a huge amount of spatiotemporal data (e.g., user behavior, logistic trajectory, transactions).

Jian Chang and Sanjian Chen outline the design of the AI engine on Alibaba’s TSDB service, which enables fast and complex analytics of large-scale retail data. They then share a successful case study of the Fresh Hema Supermarket, a major “new retail” platform operated by Alibaba Group, highlighting solutions to the major technical challenges in data cleaning, storage, and processing.

TSDB is the backbone service for hosting data to enable high-concurrency storage and low-latency query. It also provides intelligent analysis capability using AI and other data science technologies. The TSDB service scales to thousands of physical nodes and delivers peak performance at 80 million operations per second.

Handling missing data is a key challenge in retail: For example, a missing store data point on a specific day could be caused by data transmission errors or actual store closure due to a holiday, renovations, or a natural disaster. How you treat such data gaps can profoundly impact the analytics results. The data cleaning module in the TSDB Intelligence Engine runs machine learning algorithms across multiple data sources to accurately diagnose the cause of missing data and automatically performs smart null-filling operations that are aligned with business expectations.

TSDB also performs a multitude of optimizations to enable fast access and computation at runtime. For example, retail analytics applications frequently deal with data aggregations across different product hierarchies, hierarchical geographic organizations, and timelines. With customized optimization techniques, the preaggregation module in TSDB runs concurrent multilevel rollups on hundreds of financial sources along different temporal and spatial dimensions.

Another major analytical challenge in retail big data applications is the low signal-to-noise ratio: the net profit margin of leading retailers typically ranges from 1% to 3%, but the financial KPIs are influenced by numerous micro- and macroeconomic factors. TSDB leverages a rich set of advanced time series feature-extraction algorithms to quantify the true impact of business actions in the sea of noise. Alibaba also developed deep learning functions in the Intelligence Engine to automatically detect interesting trends in the real-time data streams and provide actionable insights.

With all the features above, the Intelligence Engine in TSDB provides a full stack analytics solution to help retail companies identify interesting patterns from the most fine-grained data sources and achieve higher ROI by leveraging detailed closed-loop decision feedback in real time.



Alibaba Group

Jian Chang is a senior algorithm expert at the Alibaba Group, where he is working on cutting-edge applications of AI at the intersection of high-performance databases and the IoT, focusing on unleashing the value of spatiotemporal data. A data science expert and software system architect with expertise in machine learning and big data systems and deep domain knowledge on various vertical use cases (finance, telco, healthcare, etc.), Jian has led innovation projects and R&D activities to promote data science best practices within large organizations. He’s a frequent speaker at technology conferences, such as the O’Reilly Strata and AI Conferences, NVIDIA’s GPU Technology Conference, Hadoop Summit, DataWorks Summit, Amazon re:Invent, Global Big Data Conference, Global AI Conference, World IoT Expo, and Intel Partner Summit, and has published and presented research papers and posters at many top-tier conferences and journals, including ACM Computing Surveys, ACSAC, CEAS, EuroSec, FGCS, HiCoNS, HSCC, IEEE Systems Journal, MASHUPS, PST, SSS, TRUST, and WiVeC. He’s also served as a reviewer for many highly reputable international journals and conferences. Jian holds a PhD from the Department of Computer and Information Science (CIS) at University of Pennsylvania, under Insup Lee.

Photo of Sanjian Chen

Sanjian Chen

Alibaba Group

Sanjian Chen is a Senior Algorithm Expert at the Alibaba Group. He has deep knowledge of large-scale machine learning algorithms. Over his career, he’s partnered with and advised leaders at several Fortune 500 companies on making data-driven strategic decisions and provided software-based data analytics consulting service to seven global firms across multiple industries, including financial services, automotive, telecommunications, and retail.