Named entity recognition at scale with deep learning





Who is this presentation for?
- Software engineers, data scientists, and ML engineers
Level
IntermediateDescription
Twitter is what’s happening in the world right now, and operating at such a global scale brings massive engineering challenges. To connect users with the best content, Twitter needs to build up a deep understanding of its text content. Such understanding needs to be scalable to annotate more than 500 million tweets per day, in real time to accommodate the live nature of Twitter, and multilingual due to the number of languages Twitter supports.
Sijun He and Ali Mollahosseini offer insights into how Twitter Cortex built and productionized a deep learning-based NER system to address those challenges. He highlights Twitter’s experimentations with state-of-the-art models (i.e., BERT) and learning methods (i.e., semisupervised learning and active learning), as well as how Twitter has balanced such efforts to keep in sync with recent developments in natural language processing (NLP) with engineering needs.
Prerequisite knowledge
- A basic understanding of ML, NLP, and deep learning
What you'll learn
- Learn how Twitter developed its NER system from data sampling to labeling and from model development to serving infrastructure
- Understand the models for NER and serving infrastructure needed to put those models into production

Sijun He
Sijun He is a machine learning engineer at Twitter Cortex, where he works on content understanding with deep learning and NLP. Previously, he was a data scientist at Autodesk. Sijun holds an MS in statistics from Stanford University.

Ali Mollahosseini
Ali Mollahosseini is a senior machine learning engineer and tech lead of content understanding and applied deep-learning (CUAD) team at Twitter Cortex. His research focuses on NLP and developing neural network architectures to improve Twitter’s understanding of content on the platform using the latest advances in deep learning. He received his PhD in computer engineering from the University of Denver. He’s published 15 papers in prestigious journals and conferences and has two patents with more than 500 citations.
Presented by
Elite Sponsors
Strategic Sponsors
Diversity and Inclusion Sponsor
Impact Sponsors
Premier Exhibitor Plus
R & D and Innovation Track Sponsor
Contact us
confreg@oreilly.com
For conference registration information and customer service
partners@oreilly.com
For more information on community discounts and trade opportunities with O’Reilly conferences
Become a sponsor
For information on exhibiting or sponsoring a conference
pr@oreilly.com
For media/analyst press inquires