Word alignment which aims to extract lexicon translation equivalents between source and target sentences, serves as a fundamental tool for natural language processing. Recent studies in this area have yielded substantial improvements by generating alignments from contextualized embeddings of the pre-trained multilingual language models. However, we find that the existing approaches capture few interactions between the input sentence pairs, which degrades the word alignment quality severely, especially for the ambiguous words in the monolingual context. To remedy this problem, we propose Cross-Align to model deep interactions between the input sentence pairs, in which the source and target sentences are encoded separately with the shared self-attention modules in the shallow layers, while cross-lingual interactions are explicitly constructed by the cross-attention modules in the upper layers. Besides, to train our model effectively, we propose a two-stage training framework, where the model is trained with a simple Translation Language Modeling (TLM) objective in the first stage and then finetuned with a self-supervised alignment objective in the second stage. Experiments show that the proposed Cross-Align achieves the state-of-the-art (SOTA) performance on four out of five language pairs.

提出一种基于多语言预训练模型的词语对齐方法，通过在浅层中利用共享的自注意力模块，将源语句和目标语句单独编码，而在较高层通过交叉注意力模块显式地构建跨语言交互，达到提高词语对齐质量的效果。通过采用两阶段训练框架，实现对模型的有效训练，实验结果在五对语言中有四对的性能达到了最新水平。

Cross-Align: 模拟深层跨语言相互作用以进行单词对齐