Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Feature engineering for diverse data types

Alice Zheng (Amazon)
5:10pm5:50pm Wednesday, March 15, 2017
Secondary topics:  Hardcore Data Science
Average rating: ****.
(4.50, 6 ratings)

Who is this presentation for?

  • Beginners in data science

What you'll learn

  • Explore popular feature engineering methods for text, logs, and images


In the machine-learning pipeline, features sit right between raw data and models. Whether the end goal is to classify, cluster, or recommend, the choice of the model is but a minor part of the process. The majority of time is spent on feature engineering.

Good features describe important semantic aspects of raw data and are easy for the model to consume. Techniques for extracting good features from text and images are very different. Semantic content is more readily discoverable in natural text than natural images: words are solid starting points for feature engineering whereas individual pixels are too low level.

Alice Zheng leads a tour of popular feature engineering methods for text, logs, and images, giving you an intuitive and actionable understanding of tricks of the trade.

Topics include:

  • Feature space and model geometry
  • Basic feature engineering for natural text (bag of words)
  • Basic feature transformations for counts (trimming, scaling, and encoding)
  • Classic image feature descriptors (SIFT/HOG)
  • Advanced image features with deep learning
Photo of Alice Zheng

Alice Zheng


Alice Zheng is a senior manager of applied science on the machine learning optimization team on Amazon’s advertising platform. She specializes in research and development of machine learning methods, tools, and applications. She’s the author of Feature Engineering for Machine Learning. Previously, Alice has worked at GraphLab, Dato, and Turi, where she led the machine learning toolkits team and spearheaded user outreach; and was a researcher in the Machine Learning Group at Microsoft Research – Redmond. Alice holds PhD and BA degrees in computer science and a BA in mathematics, all from UC Berkeley.