Human conversation is inherently complex, often spanning many different topics/domains. This makes policy learning for dialogue systems very challenging. Standard flat reinforcement learning methods do not provide an efficient framework for modelling such dialogues. In this paper, we focus on the under-explored problem of multi-domain dialogue management. First, we propose a new method for hierarchical reinforcement learning using the option framework. Next, we show that the proposed architecture learns faster and arrives at a better policy than the existing flat ones do. Moreover, we show how pretrained policies can be adapted to more complex systems with an additional set of new actions. In doing that, we show that our approach has the potential to facilitate policy optimisation for more sophisticated multi-domain dialogue systems.

本文提出在多领域对话管理中使用分层强化学习和选项框架的方法，并且与现有平面方法相比学习速度更快且得到更好的结果，同时展示预训练策略如何适应更复杂的对话系统并为更复杂的多领域对话系统提供政策优化的可能性。

基于分层强化学习的对话管理子域建模