An Introduction to Pseudo Label for Text Classification

July 27, 2022
/ eGitty

Pseudo label method is a simple but efficient way to improve text classification, it is a semi-supervised learning method. In this tutoril, we will introduce it for beginners.

Pseudo label

In order to understand pseudo label method, we should know two kinds of sets.

labeled set: it is labeld with a target label, this label is usually marked by human.
unlabeled set: it is unlabeled, there exits many this kind of sets.

Usually, we can use labeled set to train a model by cross-entropy loss, this model is often called teacher model.

As we have got a teacher model, we can use it to infer the label of the data in unlabeled set, that label is pseudo label.

How to use pseudo label to improve text classification?

We can get some useful information in paper:Towards Unifying the Label Space for Aspect- and Sentence-based Sentiment Analysis.

Formally, \(D\) is labeled set, \(D_u\) is unlabeld set, we can use \(D\) to train a teacher model with cross-entropy loss.

Then we can generate pseudo label as below and improve the cross-entropy loss for text classfication.