eGitty

Discover The Most Popular Algorithms

An Introduction to Bert Combine Multiple Features to Classification

Bert is widely used in text classification, here is an example:

An Implementation of Bert for Sentiment Analysis Using PyTorch

However, Bert only can extract text feature from a text sequence. If you have other features, how to combine them with Bert output to implement classification.

Paper: Enriching BERT with Knowledge Graph Embeddings for Document Classification will give us some suggestion.

In this paper, these external feature are used:

• Number of authors.
• Academic title (Dr. or Prof.), if found in author names (0 or 1).
• Number of words in title.
• Number of words in blurb.
• Length of longest word in blurb.
• Mean word length in blurb.
• Median word length in blurb.
• Age in years after publication date.
• Probability of first author being male or female.

It is easy to understand that these features can not be get by Bert.

How to combine multiple features with Bert for document classification?

In this paper, it will concatenate them with Bert output for classification.

Here is the structure of model.

How to combine multiple features with Bert for document classification?

We should notice 2-layer MLP are used in this paper.

How about the performance of concatenating Bert output with other features in classification problem?

There are some comparative results in this paper. For example:

How about the performance of concatenating Bert output with other features in classification problem?

we can find the mode with German, Metadata and Author information achieved the hightest F1 score. 

But, we also can find the method is litter bettern than Bert.

I think the dimension of external features is too small. Because the dimension of bert out is 768, other features is not 768.

How to improve this method?

The key point of this method is thesed used features can classify document if we do not use bert model output? If yes, these features can improve the bert performance.

Meanwhile, the importance of these used features are the same? If not, we can use attention mechanism.

Leave a Reply

Your email address will not be published. Required fields are marked *