Put AI to Work
April 15-18, 2019
New York, NY
Please log in

Industrialized capsule networks for text analytics

Vijay Agneeswaran (Walmart Labs), Abhishek Kumar (Publicis Sapient)
2:40pm3:20pm Wednesday, April 17, 2019
Case Studies, Machine Learning
Location: Sutton South
Secondary topics:  Models and Methods, Text, Language, and Speech
Average rating: ***..
(3.00, 1 rating)

Who is this presentation for?

  • Data scientists, data engineers, ML engineers, data architects, and CxOs



Prerequisite knowledge

  • Familiarity with deep learning and capsule networks

What you'll learn

  • Understand the motivation for capsule networks and how they can be used in text analytics
  • Explore recurrent capsule networks and an implementation of RCNs in TensorFlow/PyTorch
  • Discover how to benchmark capsule networks with dynamic routing and recurrent capsule networks for a real multilabel text classification use case for news categorization


Multilabel text classification is an interesting problem where multiple tags or categories may have to be associated with the given text/documents. Multilabel text classification occurs in numerous real-world scenarios, for instance, in news categorization and in bioinformatics (such as the gene classification problem, see Zafer Barutcuoglu et al. 2006). The Kaggle dataset is representative of the problem.

Several other interesting problem in text analytics exist, such as abstractive summarization, sentiment analysis, search and information retrieval, entity resolution, document categorization, document clustering, and machine translation. Deep learning has been applied to solve many of the above problems—for instance, “Effective Use of Word Order for Text Categorization with Convolutional Neural Networks” gives an early approach to applying a convolutional network to make effective use of word order in text categorization. Recurrent neural networks (RNNs) have been effective in various tasks in text analytics, as explained here. Significant progress has been achieved in language translation by modeling machine translation using an encoder-decoder approach with the encoder formed by a neural network.

However, as shown in “Capsule Networks for Protein Structure Classification and Prediction,” certain cases require modeling the hierarchical relationship in text data and is difficult to achieve with traditional deep learning networks because linguistic knowledge may have to be incorporated in these networks to achieve high accuracy. Moreover, deep learning networks do not consider hierarchical relationships between local features as pooling operation of CNNs lose information about the hierarchical relationships.

Vijay Agneeswaran and Abhishek Kumar share an industrial-scale use case of capsule networks they have implemented for a client in the realm of text analytics for news categorization. They demonstrate the performance of capsule networks on the news categorization task, using the precision, recall and F1 metrics and benchmark the performance of recurrent capsule networks for the same task and compare the two implementations against a baseline model. They also discuss how to tune key hyperparameters of capsule networks such as batch size, number of filters and size of filters, initial learning rate, number of capsules, and dimension of capsules. Vijay and Abhishek conclude by detailing some of the key challenges they faced along the way.

Topics include:

  • Motivation for capsule networks and how they can be used in text analytics
  • Overview of recurrent capsule networks
  • Implementation RCNs in TensorFlow/PyTorch
  • Benchmarking of capsule networks with dynamic routing and recurrent capsule networks for a real multilabel text classification use case for news categorization
Photo of Vijay Agneeswaran

Vijay Agneeswaran

Walmart Labs

Dr. Vijay Srinivas Agneeswaran has a Bachelor’s degree in Computer Science & Engineering from SVCE, Madras University (1998), an MS (By Research) from IIT Madras in 2001, a PhD from IIT Madras (2008) and a post-doctoral research fellowship in the LSIR Labs, Swiss Federal Institute of Technology, Lausanne (EPFL). He currently heads data sciences R&D at Walmart Labs, India. He has spent the last eighteen years creating intellectual property and building data-based products in Industry and academia. In his current role, he heads machine learning platform development and data science foundation teams, which provide platform/intelligent services for Walmart businesses across the world. In the past, he has led the team that delivered real-time hyper-personalization for a global auto-major as well as other work for various clients across domains such as retail, banking/finance, telecom, automotive etc. He has built PMML support into Spark/Storm and realized several machine learning algorithms such as LDA, Random Forests over Spark. He led a team that designed and implemented a big data governance product for a role-based fine-grained access control inside of Hadoop YARN. He and his team have also built the first distributed deep learning framework on Spark. He is a professional member of the ACM and the IEEE (Senior) for the last 10+ years. He has five full US patents and has published in leading journals and conferences, including IEEE transactions. His research interests include distributed systems, artificial intelligence as well as Big-Data and other emerging technologies.

Photo of Abhishek Kumar

Abhishek Kumar

Publicis Sapient

Abhishek Kumar is a senior manager of data science in Publicis Sapient’s India office, where he looks after scaling up the data science practice by applying machine learning and deep learning techniques to domains such as retail, ecommerce, marketing, and operations. Abhishek is an experienced data science professional and technical team lead specializing in building and managing data products from conceptualization to the deployment phase and interested in solving challenging machine learning problems. Previously, he worked in the R&D center for the largest power-generation company in India on various machine learning projects involving predictive modeling, forecasting, optimization, and anomaly detection and led the center’s data science team in the development and deployment of data science-related projects in several thermal and solar power plant sites. Abhishek is a technical writer and blogger as well as a Pluralsight author and has created several data science courses. He’s also a regular speaker at various national and international conferences and universities. Abhishek holds a master’s degree in information and data science from the University of California, Berkeley. Abhishek has spoken at past O’Reilly conferences, including Strata 2019, Strata 2018, and AI 2019.