Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. To apply such models to a real-world scenario, some existing work uses predicted answers, instead of unavailable ground-truth answers, as the conversation history for inference. However, since these models usually predict wrong answers, using all the predictions without filtering significantly hampers the model performance. To address this problem, we propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model, without making any architectural changes. Moreover, to make the confidence and uncertainty values more reliable, we propose to further calibrate them, thereby smoothing the model predictions. We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets, and the results show that our models significantly outperform relevant baselines. Code is available at: https://github.com/starsuzi/AS-ConvQA.

研究提出通过预测答案对话历史进行推理，并利用估计的置信度和不确定性过滤出不准确的答案， 最终将置信度和不确定性值进行校准，以此提高问答模型的性能表现。实验结果表明，该方法在两个标准 ConvQA 数据集上表现出了显著优于基线模型的效果。

基于置信度校准和不确定性度量的现实对话问答与答案选择