We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.

通过引入度量学习编码模型（MLEMs）作为一种新方法，本研究运用MLEMs将从BERT提取的神经表示应用于跟踪各种语言特征，并发现：（1）语言特征被排序，不同层次中句子的表示有不同程度的分离；（2）神经表示按层次组织，某些层次中，表示被嵌套在更大的表示群集中，遵循连续重要的语言特征；（3）语言特征在中间层次被解耦，不同语言特征激活不同的单位。在方法上，MLEMs优于多变量解码方法（4），更robust于Ⅰ类错误，并且优于单变量编码方法（5），能够预测局部和分布式表示。这证明了度量学习编码方法在研究语言模型中语言特征如何神经编码以及MLEMs相较传统方法的优势。MLEMs可以用于其他领域（例如视觉）和其他神经系统，如人脑。

度量学习编码模型识别BERT表示中语言特征的处理特征