Sep 23–26, 2019
Please log in

Introduction to natural language processing in Python

Alice Zhao (Metis)
9:00am12:30pm Tuesday, September 24, 2019
Location: 1A 23/24
Average rating: ****.
(4.54, 13 ratings)

Who is this presentation for?

  • Beginners, data analysts, and business analysts




NLP is an exciting branch of AI that allows machines to break down and understand human language. Data scientists may often use NLP techniques to interpret text data for analysis. Alice Zhao walks you through text preprocessing techniques, machine learning techniques, and Python libraries for NLP.

Text preprocessing techniques include tokenization, text normalization, and data cleaning. Once in a standard format, various machine learning techniques can be applied to better understand the data. This includes using popular modeling techniques to classify emails as spam or not or to score the sentiment of a tweet on Twitter. Newer, more complex techniques can also be used, such as topic modeling, word embeddings, or text generation with deep learning.

You’ll see an example in a Jupyter notebook that goes through all of the steps of a text analysis project using several NLP libraries in Python, including NLTK, TextBlob, spaCy, and gensim, along with the standard machine learning libraries including pandas and scikit-learn.

Prerequisite knowledge

  • Experience with programming (the tutorial is in Python, but familiarity in any language such as R, advanced Excel, etc. would be useful) and data analysis (ability to read charts, interpret summary statistics, etc.)

Materials or downloads needed in advance

What you'll learn

  • Learn natural language processing basics including data cleaning, exploratory data analysis, sentiment analysis, topic modeling, and text generation—along with Python code
Photo of Alice Zhao

Alice Zhao


Alice Zhao is a senior data scientist at Metis, where she teaches 12-week data science bootcamps. Previously, she was the first data scientist and supported multiple functions from marketing to technology at; cofounded a data science education startup where she taught weekend courses to professionals at 1871 in Chicago at Best Fit Analytics Workshop; was an analyst at Redfin; and was a consultant at Accenture. She blogs about analytics and pop culture on A Dash of Data. Her blog post, “How Text Messages Change From Dating to Marriage” made it onto the front page of Reddit, gaining over half a million views in the first week. She’s passionate about teaching and mentoring and loves using data to tell fun and compelling stories. She has her MS in analytics and BS in electrical engineering, both from Northwestern University.

Comments on this page are now closed.


Picture of Alice Zhao
Alice Zhao | Senior Data Scientist
09/10/2019 10:25am EDT

Here’s the link to the Github repository:

Derek Kurth | Technologist
09/10/2019 10:20am EDT

Hi! Under “Materials or downloads needed in advance,” I see “downloaded Jupyter notebooks from the Github repository.” Would you please let me know which Github repository this refers to? Thank you!

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  •, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    For conference registration information and customer service

    For more information on community discounts and trade opportunities with O’Reilly conferences

    For information on exhibiting or sponsoring a conference

    For media/analyst press inquires