Biomedical named entity recognition is a critical step for complex biomedical NLP tasks such as understanding the interactions between different entity types, such as the drug-disease relationship or the gene-protein relationship. Feature generation for such tasks is often complex and time consuming. However, neural networks can obviate the need for feature engineering and use original data as input.
Mohamed AbdelHady and Zoran Dzunic demonstrate how to build a domain-specific entity extraction system from unstructured text using deep learning. In the model, domain-specific word embedding vectors are trained with word2vec learning algorithm on a Spark cluster using millions of Medline PubMed abstracts and then used as features to train a LSTM recurrent neural network for entity extraction, using Keras with TensorFlow or CNTK on a GPU-enabled Azure Data Science Virtual Machine (DSVM). Results show that training a domain-specific word embedding model boosts performance when compared to embeddings trained on generic data such as Google News.
Mohamed AbdelHady is a senior data scientist on the algorithms and data science (ADS) team within the AI+R Group at Microsoft, where he focuses on machine learning applications for text analytics and natural language processing. Mohamed works with Microsoft product teams and external customers to deliver advanced technologies that extract useful and actionable insights from unstructured free text such as search queries, social network messages, product reviews, customer feedback. Previously, he spent three years at Microsoft Research’s Advanced Technology Labs. He holds a PhD in machine learning from the University of Ulm in Germany.
Zoran Dzunic is a data scientist on the algorithms and data science (ADS) team within the AI+R Group at Microsoft, where he focuses on machine learning applications for text analytics and natural language processing. He holds a PhD and a master’s degree from MIT, where he focused on Bayesian probabilistic inference, and a bachelor’s degree from the University of Nis in Serbia.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2018, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com