Reinforcement learning has been applied to train the dialog systems in many works. Previous approaches divide the dialog system into multiple modules including DST (dialog state tracking) and DP (dialog policy), and train these modules simultaneously. However, different modules influence each other during training. The errors from DST might misguide the dialog policy, and the system action brings extra difficulties for the DST module. To alleviate this problem, we propose Asynchronous Updating Reinforcement Learning framework (AURL) that updates the DST module and the DP module asynchronously under a cooperative setting. Furthermore, curriculum learning is implemented to address the problem of unbalanced data distribution during reinforcement learning sampling, and multiple user models are introduced to increase the dialog diversity. Results on the public SSD-PHONE dataset show that our method achieves a compelling result with a 31.37% improvement on the dialog success rate. The code is publicly available via https://github.com/shunjiu/AURL.

提出了异步更新强化学习框架（AURL），通过协作设置异步更新DST模块和DP模块，并实现课程学习以解决强化学习采样过程中不平衡数据分布的问题，并引入多个用户模型增加对话的多样性，实验表明，在公共数据集SSD-PHONE上，该方法使对话成功率提高了31.37%。

面向任务导向对话系统的异步更新强化学习框架