Presented By O'Reilly and Cloudera
Make Data Work
22–23 May 2017: Training
23–25 May 2017: Tutorials & Conference
London, UK

Gaining additional labels for data: An introduction to using semisupervised learning for real problems

Yingsong Zhang (ASI Data Science)
9:3010:00 Tuesday, 23 May 2017
Hardcore Data Science
Location: London Suite 2/3

Training a model requires some information to guide it to a useful conclusion. The information is often in the form of human-domain knowledge, which mostly appears as labels. However, labels are not always available or in the format that we would wish for. Yingsong Zhang walks you through three situations to illustrate how to apply semisupervised learning to real problems:

  1. Classic semisupervised learning: Labels are available, but gaining more is expensive. Yingsong shows how classic semisupervised wrappers can be used in training CNNs for image segmentation.
  2. Inferring latent labels: Using a training set of 30,000 Twitter messages in two labels, Yingsong demonstrates how to train a classifier to tell whether a message contains “good” questions that lead to a positive labeling.
  3. Semisupervised learning with a human expert: Labels in the ideal format do not exist. Yingsong explains how to create a measure from multiple features that quantifies the severity of an incident, as human experts are not able to score the severity accurately and consistently.
Photo of Yingsong Zhang

Yingsong Zhang

ASI Data Science

Yingsong Zhang is a data scientist at ASI, where she has worked on everything from social media data to special data from clients to build predictive models. Yingsong has published over 10 first-author research papers in top journals and conferences in the field of signal/image processing and has accumulated extensive experience in algorithm design and information representation. She recently completed a three-year postdoc project at Imperial College London developing sampling theory and the application system. Yingsong holds a BA in mathematics, an MSc in artificial intelligence and pattern recognition from one of China’s top universities, and a PhD in signal and image processing from Cambridge University.

Leave a Comment or Question

Help us make this conference the best it can be for you. Have questions you'd like this speaker to address? Suggestions for issues that deserve extra attention? Feedback that you'd like to share with the speaker and other attendees?

Join the conversation here (requires login)

Comments

Picture of Yingsong Zhang
Yingsong Zhang | DATA SCIENTIST
30/05/2017 10:29 BST

Hi Michal, I now upload my slides to my git hub. https://github.com/zysalice/strata_2017_selftraining
Together you will also find the code for toy example.

Michał Kucharczyk | BI & RISK MANAGEMENT SPECIALIST
29/05/2017 7:34 BST

Hello again, Unfortunately the strata webpage in the section speaker slides doesn’t contain slides for this talk (or I can’t find it). Could you please provide a link for the download?

Picture of Yingsong Zhang
Yingsong Zhang | DATA SCIENTIST
26/05/2017 12:52 BST

Hi all, Thank you for your interest in my talk. you can now download the slides for my talk through the strata webpage.

Michał Kucharczyk | BI & RISK MANAGEMENT SPECIALIST
26/05/2017 9:25 BST

Hello Yingsong, do you plan to share the slides?