Many open-domain dialogue models pre-trained with social media comments can
generate coherent replies but have difficulties producing engaging responses
when interacting with real users. This phenomenon might mainly result from the
deficiency of annotated human-human conversations and the misalignment with
human preference. In this paper, we propose a novel and efficient approach
Diamante to boost the open-domain chatbot, where two kinds of human feedback
(including explicit demonstration and implicit preference) are collected and
leveraged. By asking annotators to select or amend the model-generated
candidate responses, Diamante efficiently collects the human demonstrated
responses and constructs a Chinese chit-chat dataset. To enhance the alignment
with human preference, Diamante leverages the implicit preference in the data
collection process and introduces the generation-evaluation joint training.
Comprehensive experiments indicate that the Diamante dataset and joint training
paradigm can significantly boost the performance of Chinese pre-trained
dialogue models.

本文提出了一种新颖高效的方法 Diamante 通过收集并利用两种人类反馈（包括显式示范和隐含偏好）来增强开放域聊天机器人，并介绍使用的生成 - 评估联合训练来增强与人类偏好的对齐，综合实验表明 Diamante 数据集和联合训练模式可以显著提高中文预训练对话模型的性能。