Most internet companies record a constant stream of logs as a user interacts with their application. Depending on the complexity of the application, the logs can be extremely difficult to decipher. Popular approaches for this purpose are either too computationally intensive for complex applications or involve simplifying assumptions that result in too much loss of information. One potential approach is using sequential pattern mining techniques. Computational complexity of this method is prohibitive in applications where the use cases involve a long sequence of actions. Another approach is to define each use case of the application by users performing a specific action or navigating to a specific page in a session. This assumption is not valid in complex applications where users engage in diverse activities on each page.
Dorna Bandari presents a novel NLP-based method for clustering user sessions in consumer internet applications, which has proved to be extremely effective in both driving strategy and personalization. The clustering method borrows ideas from the field of NLP, which allows us to decrease the complexity of the problem without the loss of important information. The stability of the resulting clusters was assessed using Jaccard similarity of bootstrapped samples; the results showed that the method achieves 100% stability and can therefore be reliably productionized. The method is especially useful for applications with diverse and complex use cases, such as social networks, messaging platforms and lifestyle applications. Other applications of this approach include personalizing user experience on a per-session level, estimating the strategic and monetary value of different use cases of the application, analyzing experiment results by use cases, and creating new features for other machine-learning models.
Dorna Bandari is a data scientist at Pinterest, where she specializes in developing new machine-learning models in a broad range of product areas, from concept creation to productionization.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.