Sep 23–26, 2019
Please log in

Practical feature engineering

Ted Dunning (MapR, now part of HPE)
11:20am12:00pm Wednesday, September 25, 2019
Location: 1A 12/14
Average rating: *****
(5.00, 6 ratings)

Who is this presentation for?

  • Data scientists, data engineers, and machine learning engineers

Level

Intermediate

Description

Feature engineering is generally the section that gets left out of machine learning books, but it’s also the most important part of successful models, even in today’s world of deep learning. While academic courses on machine learning focus on gradients and the latest flavor of recurrent network, Ted Dunning explores the techniques that practitioners in the real world are seeking out better features and figuring out how to extract value using a variety of time-honored (and occasionally exceptionally clever) heuristics.

In a sense, feature engineering is the Rodney Dangerfield of machine learning, never getting any respect. It is, however, the task that will get you the most value for time spent in terms of model performance. This work is not just the work of the data scientist. Good features encode business realities as well and are the cross-product of good business sense and good data engineering.

Prerequisite knowledge

  • A basic understanding of how machine learning is used to teach models

What you'll learn

  • Learn some surprising techniques that can help you solve some really hard problems
Photo of Ted Dunning

Ted Dunning

MapR, now part of HPE

Ted Dunning is the chief technology officer at MapR, an HPE company. He’s also a board member for the Apache Software Foundation, a PMC member, and committer on a number of projects. Ted has years of experience with machine learning and other big data solutions across a range of sectors. He’s contributed to clustering, classification, and matrix decomposition algorithms in Mahout and to the new Mahout Math library and designed the t-digest algorithm used in several open source projects and by a variety of companies. Previously, Ted was chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems and built fraud-detection systems for ID Analytics (LifeLock). Ted has coauthored a number of books on big data topics, including several published by O’Reilly related to machine learning, and has 24 issued patents to date plus a dozen pending. He holds a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.

Comments on this page are now closed.

Comments

Picture of Ted Dunning
Ted Dunning | Chief Technology Officer
09/30/2019 12:49pm EDT

Aditya,

The slides have now been posted by the organizers on the talk page. That is better than tracking down the access issues on the link I sent.

Aditya Thota | Sr. Data Architect
09/30/2019 6:05am EDT

Hi Ted, The link provided below doesn’t open to download the presentation. Can you please provide another link? Thanks

Aditya Thota | Sr. Data Architect
09/30/2019 6:05am EDT

Hi Ted, The link provided below doesn’t open to download the presentation. Can you please provide another link? Thanks

Aditya Thota | Sr. Data Architect
09/30/2019 6:04am EDT

Hi Ted, The link provided below doesn’t open to download the presentation. Can you please provide another link? Thanks

Aditya Thota | Sr. Data Architect
09/30/2019 6:04am EDT

Hi Ted, The link provided below doesn’t open to download the presentation. Can you please provide another link? Thanks

Aditya Thota | Sr. Data Architect
09/30/2019 6:04am EDT

Hi Ted, The link provided below doesn’t open to download the presentation. Can you please provide another link? Thanks

Picture of Ted Dunning
Ted Dunning | Chief Technology Officer
09/27/2019 11:42am EDT

Try https://docs.google.com/presentation/d/1-5Nrdx7b8YjF0sCUSNwQ3TSK7JUcEayxo35RX8pd4Ks/edit?usp=sharing

aaron nematnejad | Data Scientist
09/27/2019 10:56am EDT

Hi Ted.

Do you have a link to your presentation?

Thanks

Aaron Nematnejad

  • Cloudera
  • O'Reilly
  • Google Cloud
  • IBM
  • Cisco
  • Dataiku
  • Intel
  • Io-Tahoe
  • MemSQL
  • Microsoft Azure
  • Oracle Cloud Infrastructure
  • SAS
  • Arcadia Data
  • BMC Software
  • Hazelcast
  • SAP
  • Amazon Web Services
  • Anaconda
  • Esri
  • Infoworks.io, Inc.
  • Kyligence
  • Pitney Bowes
  • Talend
  • Google Cloud
  • Confluent
  • DataStax
  • Dremio
  • Immuta
  • Impetus Technologies Inc.
  • Keyence
  • Kyvos Insights
  • StreamSets
  • Striim
  • Syncsort
  • SK holdings C&C

    Contact us

    confreg@oreilly.com

    For conference registration information and customer service

    partners@oreilly.com

    For more information on community discounts and trade opportunities with O’Reilly conferences

    strataconf@oreilly.com

    For information on exhibiting or sponsoring a conference

    pr@oreilly.com

    For media/analyst press inquires