Introduction to natural language processing in Python
Who is this presentation for?Data scientists or analysts
NLP is an exciting branch of AI that allows machines to break down and understand human language. Alice Zhao often uses NLP techniques to interpret text data for her analysis. Alice walks you through text preprocessing techniques, machine learning techniques, and Python libraries for NLP.
Text preprocessing techniques include tokenization, text normalization, and data cleaning. Once in a standard format, various machine learning techniques can be applied to better understand the data. This includes using popular modeling techniques to classify spam emails or to score the sentiment of a tweet. Newer, more complex techniques can also be used, such as topic modeling, word embeddings, or text generation with deep learning.
You’ll work on an example in a Jupyter notebook that goes through all of the steps of a text analysis project, using several NLP libraries in Python including NLTK, TextBlob, spaCy, and gensim, along with the standard machine learning libraries, including pandas and scikit-learn.
- Experience with programming and data analysis (the ability to read charts, interpret summary statistics, etc.)
- Familiarity with R, advanced Excel, etc. (useful but not required)
Materials or downloads needed in advance
- A laptop with a GitHub repo with install instructions and notebooks
What you'll learn
- Learn NLP basics, including data cleaning, exploratory data analysis, sentiment analysis, topic modeling, and text generation
- Understand coding with Python
Alice Zhao is a senior data scientist at Metis, where she teaches 12-week data science bootcamps. Previously, she was the first data scientist and supported multiple functions from marketing to technology at Cars.com; cofounded a data science education startup where she taught weekend courses to professionals at 1871 in Chicago at Best Fit Analytics Workshop; was an analyst at Redfin; and was a consultant at Accenture. She blogs about analytics and pop culture on A Dash of Data. Her blog post, “How Text Messages Change From Dating to Marriage” made it onto the front page of Reddit, gaining over half a million views in the first week. She’s passionate about teaching and mentoring, and loves using data to tell fun and compelling stories. She has her MS in analytics and BS in electrical engineering, both from Northwestern University.
Leave a Comment or Question
Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?
Join the conversation here (requires login)
Premier Diamond Sponsor
Premier Exhibitor Plus
For conference registration information and customer service
For more information on community discounts and trade opportunities with O’Reilly conferences
For information on exhibiting or sponsoring a conference
For media/analyst press inquires