Natural language processing is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, entity recognition, sentiment analysis, dependency parsing, de-identification, and natural language BI. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks.
David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on introduction to scalable NLP using the highly performant, highly scalable open source Spark NLP library. You’ll spend about half your time coding as you work through four sections, each with an end-to-end working codebase that you can change and improve.
Outline:
Using Spark NLP to build an NLP pipeline that can understand text structure, grammar, and sentiment and perform entity recognition:
Building machine learning pipeline that includes and depends on NLP annotators to generate features:
Using Spark NLP with TensorFlow to train deep learning models for state-of-the-art NLP:
Advanced Spark NLP functionality that enables a scalable open source solution to more complex language understanding use cases:
David Talby is a chief technology officer at Pacific AI, helping fast-growing companies apply big data and data science techniques to solve real-world problems in healthcare, life science, and related fields. David has extensive experience in building and operating web-scale data science and business platforms, as well as building world-class, agile, distributed teams. Previously, he led business operations for Bing Shopping in the US and Europe with Microsoft’s Bing Group and built and ran distributed teams that helped scale Amazon’s financial systems with Amazon in both Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.
Alex Thomas is a data scientist at John Snow Labs. He’s used natural language processing (NLP) and machine learning with clinical data, identity data, and job data. He’s worked with Apache Spark since version 0.9 as well as with NLP libraries and frameworks including UIMA and OpenNLP.
Claudiu Branzan is an analytics senior manager in the Applied Intelligence Group at Accenture, based in Seattle, where he leverages his more than 10 years of expertise in data science, machine learning, and AI to promote the use and benefits of these technologies to build smarter solutions to complex problems. Previously, Claudiu held highly technical client-facing leadership roles in companies using big data and advanced analytics to offer solutions for clients in healthcare, high-tech, telecom, and payments verticals.
Comments on this page are now closed.
For exhibition and sponsorship opportunities, email strataconf@oreilly.com
For information on trade opportunities with O'Reilly conferences, email partners@oreilly.com
View a complete list of Strata Data Conference contacts
©2019, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • confreg@oreilly.com
Comments
We would like to thank everyone who supported Spark NLP and made it possible for the library to win the “Most Significant Open Source Project” Strata Data Award!
The recognition is exciting, humbling, and makes the whole team feel the weight of commitment we’re making to the open source community with this project. We’re back to coding today and working on the next release. Best of luck with your NLP and AI projects, and please share with us any feedback or suggestions you find.
Hi everyone! As requested, here are the slides from today’s tutorial:
https://docs.google.com/presentation/d/1Wx-br2v8EjFLJjjZcdKgiVunm2VOlmXug8IvW7o95E4/edit?usp=sharing
Please send us any additional questions, feedback, and suggestions for Spark NLP.
Spark NLP is a finalist for the most significant open source project at the Strata Data Awards – and we need your vote :-) Please text SPARKNLP to 22333 within the next 24 hours if you’re willing to help out. Thanks in advance!
Craig, I’m sorry you’re having issues setting up for the tutorial. We’ve had 5 people test the container and notebooks over the weekend.
Please follow the instructions under ‘Docker Setup’ on this page: https://github.com/JohnSnowLabs/spark-nlp-workshop
There are 3 steps involved: pulling the Docker container to your local machine, running it, and opening your browser with a token printed on the console.
You need to install Docker before doing this. This 3-step process installs a full environment with all the libraries, dependencies, data and notebooks you’ll need to make the most of the tutorial.
The directions you sent are pretty worthless…you can’t access the links they just take you around in a circle recommending you don’t use toolbox unless you have to and never lead to the download link you show. You need to test things before you put them out…