TL;DR通过提出一个新的对抗生成算法 AddSentDiverse 和改进模型的语义关系学习能力,作者在 Stanford 问题问答数据集上实现了一个近 36.5% 的 F1 得分提升并提高了模型的鲁棒性。
Abstract
It is shown that many published models for the Stanford Question Answering
Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50%
decrease in F1 score during adversarial evaluation based on the A