Most Chinese pre-trained encoders take a character as a basic unit and learn representations according to character's external contexts, ignoring the semantics expressed in the word, which is the smallest meaningful unit in Chinese. Hence, we propose a novel word aligned attention to incorporate word segmentation information, which is complementary to various Chinese pre-trained language models. Specifically, we devise a mixed-pooling strategy to align the character level attention to the word level, and propose an effective fusion method to solve the potential issue of segmentation error propagation. As a result, word and character information are explicitly integrated at the fine-tuning procedure. Experimental results on various Chinese NLP benchmarks demonstrate that our model could bring another significant gain over several pre-trained models.

文章提出了一种新的以词为基本单元的对齐注意力方法，用于解决原有以字符为基本单元的中文预训练模型无法充分利用词义信息的问题，并通过多源信息融合的方式解决了分词误差传递的潜在问题，实验结果表明该模型对于五种中文NLP基准任务能够带来显著提升。

使用单词对齐注意力增强预训练的汉字表示