Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.

通过我们提出的REIGN框架，我们通过多种步骤来解决面对表面形式变化的局限性学习环境，其中包括系统生成训练问题的改写，通过深度强化学习改善问答模型的性能，以及在一个基准测试集上训练模型并应用于另一个。我们通过大量多样的改写生成评测数据的方式对训练模型的鲁棒性进行了严格评估，结果显示，通过改写生成的训练方法的对话问答模型明显优于只使用金标准问答对进行标准训练的模型。

对话问答模型的强化重述生成之稳健训练