Presented By O’Reilly and Cloudera
Make Data Work
September 11, 2018: Training & Tutorials
September 12–13, 2018: Keynotes & Sessions
New York, NY

Schedule: Text and Language processing and analysis sessions

9:00am–12:30pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
Garrett Hoffman (StockTwits)
Average rating: ****.
(4.75, 4 ratings)
Garrett Hoffman walks you through deep learning methods for natural language processing and natural language understanding tasks, using a live example in Python and TensorFlow with StockTwits data. Methods include word2vec, recurrent neural networks and variants (LSTM, GRU), and convolutional neural networks. Read more.
1:30pm–5:00pm Tuesday, 09/11/2018
Location: 1A 21/22 Level: Intermediate
David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
Average rating: ***..
(3.00, 7 ratings)
David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial for scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve. Read more.
1:15pm–1:55pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Intermediate
James Dreiss (Reuters)
Average rating: ***..
(3.67, 3 ratings)
James Dreiss discusses the challenges in building a content recommendation system for one of the largest news sites in the world, The particularities of the system include developing a scrolling newsfeed and the use of document vectors for semantic representation of content. Read more.
4:35pm–5:15pm Wednesday, 09/12/2018
Location: 1A 06/07 Level: Beginner
Masha Westerlund (Investopedia)
Average rating: *****
(5.00, 2 ratings)
Businesses rely on user data to power their sites, products, and sales. Can we give back by sharing those insights with users? Masha Westerlund explains how Investopedia harnessed reader data to build an index that tracks market anxiety and moves with the VIX, a proprietary measure of market volatility. You'll see how thinking outside the box helps turn data into tools for users, not stakeholders. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 08 Level: Beginner
Andreea Kremm (Netex Group), Mohammed Ibraaz Syed (UCLA)
Average rating: ****.
(4.00, 2 ratings)
Narrative economics studies the impact of popular narratives and stories on economic fluctuations in the context of human interests and emotions. Andreea Kremm and Mohammed Ibraaz Syed describe the use of emotion analysis, entity relationship extraction, and topic modeling in modeling narratives from written human communication. Read more.
5:25pm–6:05pm Wednesday, 09/12/2018
Location: Expo Hall
Mike Tung (Diffbot)
Mike Tung offers an overview of available open source and commercial knowledge graphs and explains how consumer and business applications are already taking advantage of them to provide intelligent experiences and enhanced business efficiency. Mike then discusses what's coming in the future. Read more.
11:20am–12:00pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
Andrew Montalenti ( )
Average rating: *****
(5.00, 1 rating)
What can we learn from a one-billion-person live poll of the internet? Andrew Montalenti explains how has gathered a unique dataset of news reading sessions of billions of devices, peaking at over two million sessions per minute on thousands of high-traffic news and information websites, and how the company uses this data to unearth the secrets behind online content. Read more.
1:10pm–1:50pm Thursday, 09/13/2018
Location: 1A 06/07 Level: Intermediate
David Talby (Pacific AI), Alberto Andreotti (John Snow Labs), Stacy Ashworth (SelectData), Tawny Nichols (Select Data)
Average rating: ***..
(3.00, 4 ratings)
David Talby, Alberto Andreotti, Stacy Ashworth, and Tawny Nichols outline a question-answering system for accurately extracting facts from free-text patient records and share best practices for training domain-specific deep learning NLP models. The solution is based on Spark NLP, an extension of Spark ML that provides state-of-the-art performance and accuracy for natural language understanding. Read more.
2:00pm–2:40pm Thursday, 09/13/2018
Location: 1E 12/13 Level: Non-technical
Chiny Driscoll (MetiStream), Jawad Khan (Rush University Medical Center )
Average rating: ****.
(4.00, 5 ratings)
Chiny Driscoll and Jawad Khan offer an overview of a solution by Cloudera and MetiStream that lets healthcare providers automate the extraction, processing, and analysis of clinical notes within an electronic health record in batch or real time, improving care, identifying errors, and recognizing efficiencies in billing and diagnoses. Read more.