Industrialized Capsule Networks for Text Analytics
Who is this presentation for?data scientists, data engineers, ML engineers, CxOs, data architects
Multi-label text classification is an interesting problem where multiple tags or categories may have to be associated with the given text/documents. Multi-label text classification occurs in numerous real-world scenarios, for instance, in news categorization and in bioinformatics (gene classification problem, see [Zafer Barutcuoglu et. al 2006]). Kaggle data set is representative of the problem: https://www.kaggle.com/jhoward/nb-svm-strong-linear-baseline/data.
Several other interesting problem in text analytics exist, such as abstractive summarization [Chen, Yen-Chun 2018], sentiment analysis, search and information retrieval, entity resolution, document categorization, document clustering, machine translation etc. Deep learning has been applied to solve many of the above problems – for instance, the paper [Rie Johnson et. al 2015] gives an early approach to applying a convolutional network to make effective use of word order in text categorization. Recurrent Neural Networks (RNNs) have been effective in various tasks in text analytics, as explained here. Significant progress has been achieved in language translation by modelling machine translation using an encoder-decoder approach with the encoder formed by a neural network [Dzmitry Bahdanau et. al 2014].
However, as shown in [Dan Rosa de Jesus et. al 2018] , certain cases require modelling the hierarchical relationship in text data and is difficult to achieve with traditional deep learning networks because linguistic knowledge may have to be incorporated in these networks to achieve high accuracy. Moreover, deep learning networks do not consider hierarchical relationships between local features as pooling operation of CNNs lose information about the hierarchical relationships.
We show one industrial scale use case of capsule networks which we have implemented for our client in the realm of text analytics – news categorization. We show, using the precision, recall and F1 metrics the performance of capsule networks on the news categorization task. We also benchmark the performance of recurrent capsule networks [[Yequan Wang et. al 2018]] for the same task and compare the two implementations against a baseline model. Importantly, we discuss how to tune key hyper-parameters of capsule networks such as batch size, number of filters and size of filters, initial learning rate, number of capsules and dimension of capsules. We also discuss the key challenges faced.
- Motivation for capsule networks and how they can be used in text analytics.
- Overview of recurrent capsule networks
- Implementation RCNs in TensorFlow/PyTorch.
- Benchmarking of capsule networks with dynamic routing and recurrent capsule networks for a real multi-label text classification use case for news categorization.
- [Zafer Barutcuoglu et. al 2006] Zafer Barutcuoglu, Robert E. Schapire, and Olga G. Troyanskaya. 2006. Hierarchical multi-label prediction of gene function. Bioinformatics 22, 7 (April 2006), 830-836. DOI=http://dx.doi.org/10.1093/bioinformatics/btk048
- [Rie Johnson et. al 2015] Rie Johnson, Tong Zhang: Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. HLT-NAACL 2015: 103-112.
[Dzmitry Bahdanau et. al 2014] Bahdanau, Dzmitry et al. “Neural Machine Translation by Jointly Learning to Align and Translate.” CoRR abs/1409.0473 (2014).
- [Dan Rosa de Jesus et. al 2018] Dan Rosa de Jesus, Julian Cuevas, Wilson Rivera, Silvia Crivelli (2018). “Capsule Networks for Protein Structure Classification and Prediction”,
available at https://arxiv.org/abs/1808.07475.
- [Yequan Wang et. al 2018] Yequan Wang, Aixin Sun, Jialong Han, Ying Liu, and Xiaoyan Zhu. 2018. Sentiment Analysis by Capsules. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1165-1174. DOI: https://doi.org/10.1145/3178876.3186015
- Chen, Yen-Chun and Bansal, Mohit (2018), “Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting”, eprint arXiv:1805.11080.
Prerequisite knowledgeBasics of deep learning and capsule networks.
What you'll learn* Motivation for capsule networks and how they can be used in text analytics. * Overview of recurrent capsule networks * Implementation RCNs in TensorFlow/PyTorch. * Benchmarking of capsule networks with dynamic routing and recurrent capsule networks for a real multi-label text classification use case for news categorization.
Vijay Srinivas Agneeswaran is a senior director of technology at Publicis Sapient. Vijay has spent the last 12 years creating intellectual property and building products in the big data area at Oracle, Cognizant, and Impetus, including building PMML support into Spark/Storm and implementing several machine learning algorithms, such as LDA and random forests, over Spark. He also led a team that build a big data governance product for role-based, fine-grained access control inside of Hadoop YARN and built the first distributed deep learning framework on Spark. Earlier in his career, Vijay was a postdoctoral research fellow at the LSIR Labs within the Swiss Federal Institute of Technology, Lausanne (EPFL). He is a senior member of the IEEE and a professional member of the ACM. He holds four full US patents and has published in leading journals and conferences, including _IEEE Transactions.
His research interests include distributed systems, cloud, grid, peer-to-peer computing, machine learning for big data, and other emerging technologies.
Vijay holds a bachelor’s degree in computer science and engineering from SVCE, Madras University, an MS (by research) from IIT Madras, and a PhD from IIT Madras.
Abhishek Kumar is a Senior Manager, Data science in Sapient’s Bangalore office, where he looks after scaling up the data science practice by applying machine learning and deep learning techniques to domains such as retail, ecommerce, marketing, and operations. Abhishek is an experienced data science professional and technical team lead specializing in building and managing data products from conceptualization to deployment phase and interested in solving challenging machine learning problems. Previously, he worked in the R&D center for the largest power-generation company in India on various machine learning projects involving predictive modeling, forecasting, optimization, and anomaly detection and led the center’s data science team in the development and deployment of data science-related projects in several thermal and solar power plant sites. Abhishek is a technical writer and blogger as well as a Pluralsight author and has created several data science courses. He is also a regular speaker at various national and international conferences (including strata conference ) and universities. Abhishek holds a master’s degree in information and data science from the University of California, Berkeley.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Diversity and Inclusion Sponsor
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of O'Reilly AI contacts