In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.

本文提出了一种混合式学习方法以通过在线用户交互来训练任务导向型对话系统，该方法包括强化学习和模仿学习，通过神经网络来优化并能够从用户教学中进行学习。实验结果表明，该端到端对话代理能够有效地学习并通过用户反馈了解自己的错误，并在模仿学习阶段之后应用强化学习提高完成任务的能力。

端到端可训练任务导向对话系统中的人类教学与反馈对话学习