How to Determine the Weight Between Words When Building Word Graph

February 14, 2022
/ eGitty

We can create a word graph based on a sentence, a document or a dataset. Here is the tutorial:

An Introduction to Build a Word Graph From a Document or Dataset

However, how to determine the different weight between words in this word graph? This paper: VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification proposed a method.

Word Weighted Grpah

A word weighted graph looks like:

Each node is a word.

Normalized Point-wise Mutual Information (NPMI)

NPMI is proposed to compute the weight between words based on word co-occurrences in the same sentence. It is defined as follows:

\[NPMI(i,j)=-\frac{1}{\log_{}{p(i,j)}}\log\frac{p(i,j)}{p(i)p(j)}\]

where \(i\) and \(j\) are words, \(p(i,j) = \frac{\sharp W(i,j)}{\sharp W}\), \(p(i) = \frac{\sharp W(i)}{\sharp W}\), \(\sharp W(*)\) is the number of sliding windows containing a word or a pair of words, and \(\sharp W\) is the total
number of sliding windows. To obtain long-range dependency, we set the window to the whole sentence.

Notice: in some papers, the window size may be 2 or 3.

How to determine the correlation based on a threshold?

The range of value of NPMI is [-1,1]. A positive NPMI value implies a high semantic correlation between words, while a negative NPMI value indicates little or no semantic correlation. In our approach, we create anedge between two words if their NPMI is larger than a threshold. Our experiments show that the performance is better when the threshold is between 0.0and 0.3. From this paper, we can find 0.2 threshold can be used.

How to Determine the Weight Between Words When Building Word Graph

Word Weighted Grpah

Normalized Point-wise Mutual Information (NPMI)

How to determine the correlation based on a threshold?

Leave a Reply Cancel reply