句子二义性、语法准确性和复杂性探测

Oct, 2022

句子二义性、语法准确性和复杂性探测

Sentence Ambiguity, Grammaticality and Complexity Probes

Sunit Bhattacharya, Vilém Zouhar, Ondřej Bojar

TL;DR本文研究预训练语言模型在捕捉语言学细微特征上的表现，分析了特征分类的可行性和模式，并提出警示，即不应使用表面水平数据集进行探测，应与基准线进行仔细比较，不应使用t-SNE图来确定向量表示中的特征是否存在。此外，本文展示了特征在这些模型的各层中可能高度局部化以及在上层中可能会丢失。

Abstract

It is unclear whether, how and where large pre-trained language models capture subtle linguistic traits like ambiguity, grammaticality and sentence complexity. We present results of →