End-to-end task-oriented dialog systems often suffer from out-of-distribution (OOD) inputs after being deployed in dynamic, changing, and open environments. In this work, we propose SL-Agent, a self-learning framework that combines supervised learning, reinforcement learning, and machine teaching for building end-to-end dialog systems in a more realistic changing environment setting. SL-Agent consists of a dialog model and a pre-trained reward model to judge the quality of a system response. SL-Agent enables dialog agents to automatically adapt to environments with user behavior changes by learning from human-bot interactions via reinforcement learning, with the incorporated pre-trained reward model. We validate SL-Agent in four different dialog domains. Experimental results show the effectiveness of SL-Agent for automatically adapting to changing environments using both automatic and human evaluations. Furthermore, experiments on a challenging domain extension setting demonstrate that SL-Agent can effectively adapt to new tasks using limited human corrections provided via machine teaching. We will release code, data, and pre-trained models for further research.

本文研究了如何通过自学习的方式使得Task Bots自适应于动态环境，并提出了SL-AGENT框架，该框架包含一个对话模型和一个预训练的奖励模型，能够在无需或最小化人工标注的情况下通过强化学习在人机交互中学习，并在自动和人工评估中证明其有效性。

朝向自学习的端到端任务导向对话系统