Question Answering (QA) is in increasing demand as the amount of information available online and the desire for quick access to this content grows. A common approach to QA has been to fine-tune a pretrained language model on a task-specific labeled dataset. This paradigm, however, relies on scarce, and costly to obtain, large-scale human-labeled data. We propose an unsupervised approach to training QA models with generated pseudo-training data. We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance by allowing the model to learn more complex context-question relationships. Training a QA model on this data gives a relative improvement over a previous unsupervised model in F1 score on the SQuAD dataset by about 14%, and 20% when the answer is a named entity, achieving state-of-the-art performance on SQuAD for unsupervised QA.

我们提出了一种无监督训练QA模型的方法，该方法使用生成的伪数据训练，为QA训练生成问题，通过对相关检索到的句子应用简单模板，而非原始上下文句子来实现，从而使模型能够学习更复杂的上下文问题关系。  使用这些数据训练QA模型可在SQuAD数据集上获得14％的F1分数相对提高，并且在答案为命名实体时提高20％，从而实现无监督QA的最新性能。

利用检索句子生成模板的方式提高无监督问答