Presented By
O’Reilly + Cloudera
Make Data Work
29 April–2 May 2019
London, UK

Schedule: Text and Language processing and analysis sessions

Add to your personal schedule
13:3017:00 Tuesday, 30 April 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
This is a hands-on tutorial for scalable NLP using the highly performant, highly scalable open-source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Data Engineering and Architecture
Location: Capital Suite 8/9
Moty Fania (Intel)
In this session, Moty Fania will share his experience of implementing a Sales AI platform. It handles processing of millions of website pages and sifting thru millions of tweets per day. The platform is based on unique open source technologies and was designed for real-time, data extraction and actuation. This session highlights the key learnings with a thorough review of the architecture. Read more.
Add to your personal schedule
11:1511:55 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
In this talk you will learn how to use Spark NLP and Apache Spark to standardize semi-structured text. You will see how Indeed standardizes resume content at scale. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Yves Peirsman (NLP Town)
In this age of big data, NLP professionals are all too often faced with a lack of data: written language is abundant, but labelled texts are much harder to get by. In my talk, I will discuss the most effective ways of addressing this challenge: from the semi-automatic construction of labelled training data to transfer learning approaches that reduce the need for labelled training examples. Read more.
Add to your personal schedule
12:0512:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall (Capital Hall N24)
Matthew Honnibal (Explosion AI)
Matthew Honnibal shares "one weird trick" that can give your NLP project a better chance of success: avoid a waterfall methodology where data definition, corpus construction, modeling, and deployment are performed as separate phases of work. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Maryam Jahanshahi (TapRecruit)
In this talk I will discuss exponential family embeddings, which are methods that extend the idea behind word embeddings to other data types. I will describe how we used dynamic embeddings to understand how data science skill-sets have transformed over the last 3 years using our large corpus of job descriptions. The key takeaway is that these models can enrich analysis of specialized datasets. Read more.
Add to your personal schedule
14:0514:45 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 17
David Low (Pand.ai)
Transfer Learning has been proven to be a tremendous success in the Computer Vision field as a result of ImageNet competition. In the past months, the Natural Language Processing field has witnessed several breakthroughs with transfer learning, namely ELMo, OpenAI Transformer, and ULMFit. In this talk, David will be showcasing the use of transfer learning on NLP application with SOTA accuracy. Read more.
Add to your personal schedule
16:3517:15 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Divya Choudhary (University of Southern California)
Addresses are the most unorganized textual data. In fact, structuring addresses has almost led to a new stream of NLP itself. Who would've imagined that address text data can be used to develop one of the coolest product features: finding the most precise pickup and drop-off locations for ecommerce, logistics, food delivery, and ride-hailing companies. Divya Choudhary explains. Read more.
Add to your personal schedule
17:2518:05 Wednesday, 1 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 15/16
Weifeng Zhong (Mercatus Center at George Mason University)
We developed a machine learning algorithm to “read” the People’s Daily — the official newspaper of the Communist Party of China — and predict changes in China’s policy priorities using only the information in the newspaper. The output of this algorithm, which we call the Policy Change Index (PCI) of China, turns out to be a leading indicator of the actual policy changes in China since 1951. Read more.
Add to your personal schedule
11:1511:55 Thursday, 2 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
David Dogon (Van Lanschot Kempen)
This talk discusses a best practice use case for detecting fraud at a financial institution. Where traditional systems fall short, machine learning models can provide a solution. Sifting through large amounts of transaction data, external hit lists, and unstructured text data we managed to build a dynamic and robust monitoring system that successfully detects unwanted client behavior. Read more.
Add to your personal schedule
12:0512:45 Thursday, 2 May 2019
Data Science, Machine Learning & AI
Location: Capital Suite 14
Moshe Wasserblat presents an overview of NLP Architect, an open source DL NLP library that provides SOTA NLP models making it easy for researchers to implement NLP algorithms and for data scientists to build NLP based solution for extracting insight from textual data to improve business operations. Read more.
Add to your personal schedule
14:5515:35 Thursday, 2 May 2019
Data Science, Machine Learning & AI, Expo Hall
Location: Expo Hall (Capital Hall N24)
Ines Montani (Explosion AI)
Ines Montani offers an overview of spaCy's new support for efficient and easy transfer learning and shows you how to kickstart new NLP projects with a new annotation tool, Prodigy Scale. Read more.