Bert or ELMo can generate word representations by a sentence. However, do these representations encode syntax information? Paper A Structural Probe for Finding Syntax in Word Representations gives us the answer.
From this paper, we can find:
syntax trees are embedded in a linear transformation of Bero or ELMo’s word representation space.
How to evaluate the distance between words?
In order to compute the word distance in a parse tree, we can do as follows:
Then, we can use a parse tree to train weight \(B\).
However, there is a problem. If a sentence contains 5 words, we may get a 5*5 distance matrix. How to the correct link among words?
In this paper, we can generate a minimum spanning tree on predicted distances to recovers the dependency parse structure in both ELMo and BERT.
Then we will see a tree like: