BriefGPT.xyz
Apr, 2023
PAXQA: 在训练规模上生成跨语言问答实例
PAXQA: Generating Cross-lingual Question Answering Examples at Training Scale
HTML
PDF
Bryan Li, Chris Callison-Burch
TL;DR
本文提出了一种利用现有平行语料库进行间接监督的跨语言问答(QA)的合成数据生成方法,并使用词汇约束的机器翻译提高翻译质量,生成了跨越4种语言的662K QA样例数据集,并通过消融研究证明了该方法相对于自动单词对齐的噪声比较稳健。
Abstract
Existing
question answering
(QA) systems owe much of their success to large, high-quality training data. Such annotation efforts are costly, and the difficulty compounds in the
cross-lingual
setting. Therefore, p
→