Reverse attention can be used to remove content style and generate style-independent content representation, which can be found in paper:Enhancing Content Preservation in Text Style Transfer Using Reverse Attention and Conditional Layer Normalization.
Reverse Attention
It is defined as:
We should notice:
\( \sum_{t=1}^T\tilde{a_t}\neq1\)
In order to compute reverse attention, we should get an attention.
How to get style-independent content representation?
The simplest way is to use a sequence model (GRU, BiLSTM et al) to encode reverse attention output.
For example:
\(z_x = bidirectionalGRU(\tilde{e})\)
Here \(z_x\) is our wanted.