eGitty

Discover The Most Popular Algorithms

Understand Multi-channel Features for Sentiment Classification

There are a lot of linguistic knowledge and sentiment resources nowadays. For example: sentiment lexicons. We can add these resources to improve sentiment classification.

Here is a paper on using cnn and sentiment lexicon for this task.

CNN + Sentiment Lexicon for Sentiment Analysis

Multi-channel Features for Sentiment Classification

We can view different knowlege or resources as different channel features. The key point for using these additional features to improve the model performance is how to use these multi-channel features in your model.

We can use three basic methods, here is an example:

3 Methods to Incorporate Multiple Features in Deep Learning

In paper: Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification use a different method.

Look at the architecture of the model in this paper.

Multi-channel Features for Sentiment Classification

In this paper, word embeddings, part-of-speech feature , position feature and dependency parsing feature are used.

From image above we can find:

As to model inputs, additional features are concatenated with word embeddings. In this paper, we will get three feature channels.

Of course, we also can concatenate all word embedding and other features on a singel channel. It means the model input will be: [word embedding vector, part-of-speech vector, postion vector, dependency parsing vector]. The effect of this model will much worse than 3 channels.

Self-attention in Multi-channel Features

We usually compute self-attention on each channel separately. Finally, we will merge all final vector on each channel to make classification. However, this paper use a different one.

At word layer, it used a crossover method.

self-attention on multi-channel features

Here \(V_{LN1}\) is normalized output of BiLSTM, \(Tag^m\) is the part-of-speech feature, \(V_{LN2}\) and \(V_{LN3} are the normalized output of BiLSTM with additional position and dependence parsing feature.

We can find there are two key poits:

  • Concatenating the inputs and the normalized outputs of BiLSTM for self-attention computation. This computation is better than only using the normalized outputs.
  • Transferring the concatenated vector of other additinal feature outputs for self-attention computation.

This paper evaluated the performance of different features on self-attention, we can find:

self-attention performance on multi-channel features

Leave a Reply

Your email address will not be published. Required fields are marked *