BriefGPT.xyz
Jul, 2018
使用双语句子嵌入的有效并行语料库挖掘
Effective Parallel Corpus Mining using Bilingual Sentence Embeddings
HTML
PDF
Mandy Guo, Qinlan Shen, Yinfei Yang, Heming Ge, Daniel Cer...
TL;DR
该研究提出了一种有效的并行语料库挖掘方法,使用双语句子嵌入进行训练,通过引入硬负例来实现。该方法是基于语义相似度的,结果表明该方法可以用于重建平行文本,从而训练出NMT模型,与使用原始数据训练的模型相差不大。
Abstract
This paper presents an effective approach for
parallel corpus mining
using
bilingual sentence embeddings
. Our embedding models are trained to produce similar representations exclusively for bilingual sentence pai
→