Presented By O'Reilly and Cloudera
Make Data Work
September 25–26, 2017: Training
September 26–28, 2017: Tutorials & Conference
New York, NY

Schedule: Text sessions

9:00am5:00pm Tuesday, September 26, 2017
Spark & beyond
Location: 1A 08/10
Brooke Wenig (Databricks)
Brooke Wenig introduces you to Apache Spark 2.0 core concepts with a focus on Spark's machine learning library, using text mining on real-world data as the primary end-to-end use case. Read more.
9:00am5:00pm Tuesday, September 26, 2017
Location: 1A 06/07
Ben Lorica (O'Reilly), Assaf Araki (Intel), Jacob Schreiber (University of Washington), Alex Ratner (Stanford University), Madeleine Udell (Cornell University), Yunsong Guo (Pinterest), Katherine Heller (Duke University), Alan Nichol (Rasa), Gerard de Melo (Rutgers University), Tamara Broderick (MIT), Inbal Tadeski (Anodot), Daniel Kang (Stanford University), Bichen Wu (UC Berkeley), Shaked Shammah (Hebrew University)
A full day of hardcore data science, exploring emerging topics and new areas of study made possible by vast troves of raw data and cutting-edge architectures for analyzing and exploring information. Along the way, leading data science practitioners teach new techniques and technologies to add to your data science toolbox. Read more.
1:30pm5:00pm Tuesday, September 26, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 23/24 Level: Intermediate
David Talby (Pacific AI), Claudiu Branzan (Accenture), Alex Thomas (John Snow Labs)
Natural language processing is a key component in many data science systems that must understand or reason about text. David Talby, Claudiu Branzan, and Alex Thomas lead a hands-on tutorial on scalable NLP using spaCy for building annotation pipelines, TensorFlow for training custom machine-learned annotators, and Spark ML and TensorFlow for using deep learning to build and apply word embeddings. Read more.
2:05pm2:45pm Wednesday, September 27, 2017
Eui-Hong Han (The Washington Post), Ling Jiang (The Washington Post)
Average rating: ****.
(4.50, 2 ratings)
The quality of online comments is critical to the Washington Post. However, the quality management of the comment section currently requires costly manual resources. Eui-Hong Han and Ling Jiang discuss ModBot, a machine learning-based tool developed for automatic comments moderation, and share the challenges they faced in developing and deploying ModBot into production. Read more.
4:35pm5:15pm Wednesday, September 27, 2017
Visualization & user experience
Location: 1E 15/16 Level: Beginner
Richard Brath (Uncharted Software), Scott Langevin (Uncharted Software)
Average rating: ****.
(4.50, 2 ratings)
Text analytics are advancing rapidly, and new visualization techniques for text are providing new capabilities. Richard Brath and Scott Langevin offer an overview of these new ways to organize massive volumes of text, characterize subjects, score synopses, and skim through lots of documents. Read more.
11:20am12:00pm Thursday, September 28, 2017
Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Paco Nathan (
Average rating: *****
(5.00, 3 ratings)
Paco Nathan demonstrates how to use PyTextRank—an open source Python implementation of TextRank that builds atop spaCy, datasketch, NetworkX, and other popular libraries to prepare raw text for AI applications in media and learning—to move beyond outdated techniques such as stemming, n-grams, or bag-of-words while performing advanced NLP on single-server solutions. Read more.
2:55pm3:35pm Thursday, September 28, 2017
Data science & advanced analytics, Machine Learning & Data Science
Location: 1A 06/07 Level: Intermediate
Michelle Casbon (Google)
Average rating: ****.
(4.00, 4 ratings)
Michelle Casbon explores the machine learning and natural language processing that enables teams to build products that feel native to every user and explains how Qordoba is tackling the underserved domain of localization using open source tools, including Kubernetes, Docker, Scala, Apache Spark, Apache Cassandra, and Apache PredictionIO (incubating). Read more.
4:35pm5:15pm Thursday, September 28, 2017
Data science & advanced analytics
Location: 1E 15/16 Level: Intermediate
Noemi Derzsy (Rensselaer Polytechnic Institute)
Open source data has enabled society to engage in community-based research and has provided government agencies with more visibility and trust from individuals. Noemi Derzsy offers an overview of the openNASA platform and discusses openNASA metadata analysis and tools for applying NLP and topic modeling techniques to understand open government dataset associations. Read more.