As more and more users come online across the globe, increasing numbers of people are voicing their opinion on social media, review sites, and more. Given such valuable data, modern deep learning-based sentiment analysis methods excel at determining the sentiment of what is being said about companies and products. Unfortunately, such deep methods require substantial amounts of training data. This is a significant problem for many of the world’s languages, for which resources may be too costly to obtain and training data is scarce, especially when one considers that new training data is needed for each domain and genre. A model trained on movie reviews, for instance, will fare very poorly on the task of assessing digital camera reviews, let alone social media postings such as tweets. To make things worse, many languages have peculiarities that make them significantly more challenging than English.
Gerard de Melo shares approaches to overcome these challenges, demonstrating how to exploit large amounts of surrogate data to learn advanced word representations that are custom-tailored for sentiment and outlining a special deep neural architecture to use them. For instance, the word “hot” is often positive when referring to music but tends to be negative when referring to the temperature in a hotel room. These sorts of representations can be fed into a bespoke deep convolutional network that is particularly adept at operating on them. Gerard concludes by demonstrating how the resulting freely available resources can be used by developers to effortlessly perform sentiment and emotion analysis in a number of different natural languages.
Gerard de Melo is an assistant professor of computer science at Rutgers University, where he heads a team of researchers working on big data analytics, natural language processing, and web mining. Gerard’s research projects include UWN/MENTA, one of the largest multilingual knowledge bases, and Lexvo.org, an important hub in the web of data. Previously, he was a faculty member at Tsinghua University, one of China’s most prestigious universities, where he headed the Web Mining and Language Technology Group, and a visiting scholar at UC Berkeley, where he worked in the ICSI AI Group. He serves as an editorial board member for Computational Intelligence, the Journal of Web Semantics, the Springer Language Resources and Evaluation journal, and the Language Science Press TMNLP book series. Gerard has published over 80 papers, with best paper or demo awards at WWW 2011, CIKM 2010, ICGL 2008, and the NAACL 2015 Workshop on Vector Space Modeling, as well as an ACL 2014 best paper honorable mention, a best student paper award nomination at ESWC 2015, and a thesis award for his work on graph algorithms for knowledge modeling. He holds a PhD in computer science from the Max Planck Institute for Informatics.
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com