Sep 23–26, 2019

Practical Feature Engineering

Ted Dunning (MapR)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 12/14

Who is this presentation for?

Data scientists, data engineers, machine learning engineers




Feature engineering is generally the section that gets left out of machine learning books, but it is also the most important part of successful models, even in today’s world of deep learning. While academic courses on machine learning focus on gradients and the latest flavor of recurrent network, practitioners in the real-world are seeking out better features and figuring out how to extract value using a variety of time-honored (and occasionally damned clever) heuristics. In a sense, feature engineering is the Rodney Dangerfield of machine learning, never getting any respect. It is, however, the task that will get you the most value for time spent in terms of model performance. This work is not just the work of the data scientist. Good features encode business realities as well and are the cross product of good business sense and good data engineering.

Prerequisite knowledge

Basic idea of how machine learning is used to learn models.

What you'll learn

Attendees will hear some surprising techniques that can help them solve some really hard problems.
Photo of Ted Dunning

Ted Dunning


Ted Dunning is CTO at MapR. He’s also a board member for the Apache Software Foundation, a PMC member and committer on many Apache projects, and a mentor for various incubator projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He has contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Contact us

For conference registration information and customer service

For more information on community discounts and trade opportunities with O’Reilly conferences

For information on exhibiting or sponsoring a conference

Contact list

View a complete list of Strata Data Conference contacts