Introduction to Natural Language Processing in Python
Who is this presentation for?Beginners, data analysts, business analysts
Prerequisite knowledgeSome programming experience (tutorial is in Python, but familiarity in any language such as R, advanced Excel, etc. would be useful), some data analysis experience (ability to read charts, interpret summary statistics, etc.)
Materials or downloads needed in advance
What you'll learn
Natural language processing (NLP) is an exciting branch of artificial intelligence (AI) that allows machines to break down and understand human language. As a data scientist, I often use NLP techniques to interpret text data that I’m working with for my analysis. During this tutorial, I plan to walk through text pre-processing techniques, machine learning techniques and Python libraries for NLP.
Text pre-processing techniques include tokenization, text normalization and data cleaning. Once in a standard format, various machine learning techniques can be applied to better understand the data. This includes using popular modeling techniques to classify emails as spam or not, or to score the sentiment of a tweet on Twitter. Newer, more complex techniques can also be used such as topic modeling, word embeddings or text generation with deep learning.
We will walk through an example in Jupyter Notebook that goes through all of the steps of a text analysis project, using several NLP libraries in Python including NLTK, TextBlob, spaCy and gensim along with the standard machine learning libraries including pandas and scikit-learn.
Alice Zhao is currently a Senior Data Scientist at Metis, where she teaches 12-week data science bootcamps. Previously, she worked at Cars.com, where she started as the company’s first data scientist, supporting multiple functions from Marketing to Technology. During that time, she also co-founded a data science education startup, Best Fit Analytics Workshop, teaching weekend courses to professionals at 1871 in Chicago. Prior to becoming a data scientist, she worked at Redfin as an analyst and at Accenture as a consultant. She has her M.S. in Analytics and B.S. in Electrical Engineering, both from Northwestern University. She blogs about analytics and pop culture on A Dash of Data. Her blog post, “How Text Messages Change From Dating to Marriage” made it onto the front page of Reddit, gaining over half a million views in the first week. She is passionate about teaching and mentoring, and loves using data to tell fun and compelling stories.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
View a complete list of Strata Data Conference contacts