Presented By O'Reilly and Cloudera
Make Data Work
March 13–14, 2017: Training
March 14–16, 2017: Tutorials & Conference
San Jose, CA

Feature engineering for diverse data types

Alice Zheng (Amazon)
5:10pm5:50pm Wednesday, March 15, 2017
Secondary topics:  Hardcore Data Science
Average rating: ****.
(4.50, 6 ratings)

Who is this presentation for?

  • Beginners in data science

What you'll learn

  • Explore popular feature engineering methods for text, logs, and images

Description

In the machine-learning pipeline, features sit right between raw data and models. Whether the end goal is to classify, cluster, or recommend, the choice of the model is but a minor part of the process. The majority of time is spent on feature engineering.

Good features describe important semantic aspects of raw data and are easy for the model to consume. Techniques for extracting good features from text and images are very different. Semantic content is more readily discoverable in natural text than natural images: words are solid starting points for feature engineering whereas individual pixels are too low level.

Alice Zheng leads a tour of popular feature engineering methods for text, logs, and images, giving you an intuitive and actionable understanding of tricks of the trade.

Topics include:

  • Feature space and model geometry
  • Basic feature engineering for natural text (bag of words)
  • Basic feature transformations for counts (trimming, scaling, and encoding)
  • Classic image feature descriptors (SIFT/HOG)
  • Advanced image features with deep learning
Photo of Alice Zheng

Alice Zheng

Amazon

Alice Zheng manages the optimization team on Amazon’s Ad Platform. Alice specializes in research and development of machine-learning methods, tools, and applications. Outside of work, she is writing a book, Mastering Feature Engineering. Previously, Alice worked at GraphLab/Dato/Turi, where she led the machine-learning toolkits team and spearheaded user outreach, was a researcher in the Machine Learning group at Microsoft Research, Redmond, and was a postdoc at Carnegie Mellon University. Alice holds PhD and BA degrees in computer science and a BA in mathematics, all from UC Berkeley.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)