eGitty

Discover The Most Popular Algorithms

Beginner Guide to Use Bert for Multi-task Learning

This tutorial will discuss how to use bert model for multi-task learning. You can build your custom model from this post.

In paper: Multi-Task Deep Neural Networks for Natural Language Understanding, it proposed an algorithm on how to build a multi-task learning. Here is the algorithm.

algorithm for building multi-task learning model

From this algorithm, we can find:

1. Each dataset t is packed into mini-batch

2. We will merge all dataset with mini-batch

3. As to for \(b_t\) in \(D\), we can find we will train a task batch each time.

4. When we compute the final loss \(L\), we should notice: different task dataset on different task. For example, if you input a classification task data batch, how to compute the loss on ranking loss?

Bert for Multi-task Learning

As to bert, we also can use it for multi-task learning. The architecture looks like:

Bert for Multi-task Learning

Then, we can use bert model as shared layers.

When training, we can set the learning rate 5e-5 and a batch size of 32. A linear learning rate decay schedule with warm-up over 0.1 was used. The max epoch can be set to 5.

Leave a Reply

Your email address will not be published. Required fields are marked *