无监督双语平行语料库构建及词对齐的双语词汇识别

Jan, 2021

无监督双语平行语料库构建及词对齐的双语词汇识别

Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment

Haoyue Shi, Luke Zettlemoyer, Sida I. Wang

TL;DR本文提出了一种过程，结合自监督的双文本挖掘与自监督的词对齐，从而产生更高质量的双语词典，进一步地，学习过滤结果的词汇条目，最终模型在12种语言对上的BUCC 2020共享任务中，比现有技术提高了14个F1点，同时提供更加可解释的方法和丰富的词义语境推理能力。

Abstract

bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear projections to align monolingual word embedding spaces. In this paper, we show it is possible to produce much higher quality lexicons with methods that combine (1