Dialogue policy learning, a subtask that determines the content of system response generation and then the degree of task completion, is essential for task-oriented dialogue systems. However, the unbalanced distribution of system actions in dialogue datasets often causes difficulty in learning to generate desired actions and responses. In this paper, we propose a retrieve-and-memorize framework to enhance the learning of system actions. Specially, we first design a neural context-aware retrieval module to retrieve multiple candidate system actions from the training set given a dialogue context. Then, we propose a memory-augmented multi-decoder network to generate the system actions conditioned on the candidate actions, which allows the network to adaptively select key information in the candidate actions and ignore noises. We conduct experiments on the large-scale multi-domain task-oriented dialogue dataset MultiWOZ 2.0 and MultiWOZ 2.1.~Experimental results show that our method achieves competitive performance among several state-of-the-art models in the context-to-response generation task.

本研究提出了一种“检索和记忆”框架，该框架首先使用神经上下文感知检索模块从训练集中检索多个候选系统操作，然后使用存储增强的多解码器网络在候选操作的条件下生成系统操作的方法，该方法能降噪自适应选择候选操作中的关键信息。实验表明该方法在上下文到回应生成任务中具有竞争力。

检索与记忆：使用多动作记忆的对话策略学习