可视化和测量 BERT 的几何形状

Jun, 2019

Visualizing and Measuring the Geometry of BERT

Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce...

TL;DR本文描述了一种特别有效的模型BERT，它能够通过从语义和句法子空间中提取一般有用的语言特征来代表语言信息，同时还探讨了注意力矩阵和单词嵌入中的句法表示，并提出了一种数学证明来解释这些表示的几何形态。

Abstract

transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract