Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Clustering user sessions with NLP methods in complex internet applications

Dorna Bandari (Jetlore)
11:00am11:40am Thursday, March 16, 2017
Secondary topics:  Hardcore Data Science, Media, Text
Average rating: ****.
(4.00, 2 ratings)

What you'll learn

  • Explore a novel NLP-based method for clustering user sessions in consumer internet applications, which has proved to be extremely effective in both driving strategy and personalization

Description

Most internet companies record a constant stream of logs as a user interacts with their application. Depending on the complexity of the application, the logs can be extremely difficult to decipher. Popular approaches for this purpose are either too computationally intensive for complex applications or involve simplifying assumptions that result in too much loss of information. One potential approach is using sequential pattern mining techniques. Computational complexity of this method is prohibitive in applications where the use cases involve a long sequence of actions. Another approach is to define each use case of the application by users performing a specific action or navigating to a specific page in a session. This assumption is not valid in complex applications where users engage in diverse activities on each page.

Dorna Bandari presents a novel NLP-based method for clustering user sessions in consumer internet applications, which has proved to be extremely effective in both driving strategy and personalization. The clustering method borrows ideas from the field of NLP, which allows us to decrease the complexity of the problem without the loss of important information. The stability of the resulting clusters was assessed using Jaccard similarity of bootstrapped samples; the results showed that the method achieves 100% stability and can therefore be reliably productionized. The method is especially useful for applications with diverse and complex use cases, such as social networks, messaging platforms and lifestyle applications. Other applications of this approach include personalizing user experience on a per-session level, estimating the strategic and monetary value of different use cases of the application, analyzing experiment results by use cases, and creating new features for other machine-learning models.

Photo of Dorna Bandari

Dorna Bandari

Jetlore

Dorna Bandari is a data scientist at Pinterest, where she specializes in developing new machine-learning models in a broad range of product areas, from concept creation to productionization.