Agreement loss is proposed in paper: Learning from Noisy Labels for Entity-Centric Information Extraction. It is a regularization method.
In this paper, M models are created and get M predicted distributions. Here \(M\geq2\)
We can implement agreement loss as follows:
Step 1: averaging M predictions
Of course, we also can use max-pooling or self attention method to calculate \(q_i\).
Step 2: building agreement loss
Here € is a small positive number to avoid division by zero.
Agreement loss will make M model predicted distribution be similar.
Step 3: Buliding finally loss
The final loss can be:
L = average_model_loss + λ*L_agg
λ is a hyperparameter, it can be 1.