Attentive Statistics Pooling is proposed in paper: Attentive Statistics Pooling for Deep Speaker Embedding. In this article, we will introduce it for beginners.
It is computed as:
Attentive Statistics Pooling is also a kind of attention method. Comparing to traditional method, it also use standard deviation of the sequence.
It can be computed as:
Here f function can be a non-linear activation function, such as a tanh or ReLU function.
The final output is the \([\widetilde{u}:\widetilde{\sigma}]\)