In the machine-learning pipeline, features sit right between raw data and models. Whether the end goal is to classify, cluster, or recommend, the choice of the model is but a minor part of the process. The majority of time is spent on feature engineering.
Good features describe important semantic aspects of raw data and are easy for the model to consume. Techniques for extracting good features from text and images are very different. Semantic content is more readily discoverable in natural text than natural images: words are solid starting points for feature engineering whereas individual pixels are too low level.
Alice Zheng leads a tour of popular feature engineering methods for text, logs, and images, giving you an intuitive and actionable understanding of tricks of the trade.
Alice Zheng is a senior manager of applied science on the machine learning optimization team on Amazon’s advertising platform. She specializes in research and development of machine learning methods, tools, and applications. She’s the author of Feature Engineering for Machine Learning. Previously, Alice has worked at GraphLab, Dato, and Turi, where she led the machine learning toolkits team and spearheaded user outreach; and was a researcher in the Machine Learning Group at Microsoft Research – Redmond. Alice holds PhD and BA degrees in computer science and a BA in mathematics, all from UC Berkeley.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org
Apache Hadoop, Hadoop, Apache Spark, Spark, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries, and are used with permission. The Apache Software Foundation has no affiliation with and does not endorse, or review the materials provided at this event, which is managed by O'Reilly Media and/or Cloudera.