Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log-likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with actual question quality, which is often reflected by certain global properties such as whether the question can be answered by the document. As such, we directly optimize for QG-specific objectives via reinforcement learning to improve question quality. We design three different rewards that target to improve the fluency, relevance, and answerability of generated questions. We conduct both automatic and human evaluations in addition to a thorough analysis to explore the effect of each QG-specific reward. We find that optimizing question-specific rewards generally leads to better performance in automatic evaluation metrics. However, only the rewards that correlate well with human judgement (e.g., relevance) lead to real improvement in question quality. Optimizing for the others, especially answerability, introduces incorrect bias to the model, resulting in poor question quality. Our code is publicly available at https://github.com/YuxiXie/RL-for-Question-Generation.

通过强化学习优化针对问题产生特定目标的奖励，如流畅性、相关性和可回答性，以提高生成问题的质量。优化问题特定的奖励通常会在自动评估指标中表现出更好的性能，但是，仅与人类判断相关的奖励（例如相关性）会在实际问题质量上带来真正的改善。只优化可回答性等其他问题会引入模型的错误偏见，导致质量差的问题。

探究生成深度问题的问题特定奖励