Share AI Models and Methods

Evaluating Sentence Similarity Using BertScore

In this article, we will introduce a new metric for sentence similarity – BertScore, which has better performance than cosine similarity.


In order to evaluate the similarity between two sentences, we usually compute cosine similarity between them based on sentence vectors. However, this similarity may poor performance.


BertScore is proposed in paper: BERTScore: Evaluating Text Generation with BERT

Different from cosine similarity, it will compute sentence similarity based on each token in a sentences.

It contians three metrics: \(R_{bert}\), \(P_{bert}\) and \(F_{bert}\). They can be computed as follows:

Evaluating Sentence Similarity Using BertScore

Here \(x\) and \(\hat{x}\) are all tokens in two different sentences.

As to each token representation, we can get by Bert or XLNet.

BertScore illustration

Token weight in BertScore

Formulas above, we can find all tokens have the same importance, however, rare words can be more indicative for sentence similarity than common words. This paper proposed a method to compute.

token important weight in BertScore

Here M is the count of reference sentences.


There are some resuls in this paper, we can find the effectness of BertScore.

BertScore performance

Leave a Reply

Your email address will not be published. Required fields are marked *