BriefGPT.xyz
Jun, 2016
使用双阶段训练的策略网络用于对话系统
Policy Networks with Two-Stage Training for Dialogue Systems
HTML
PDF
Mehdi Fatemi, Layla El Asri, Hannes Schulz, Jing He, Kaheer Suleman
TL;DR
本文提出使用训练有优势actor-critic方法的深度策略网络统计优化对话系统,演示了在深度强化学习下优于高斯过程方法,可以有效地训练部分可观察马尔可夫决策过程的对话系统,有效提高学习速度,所有实验在DSTC2餐厅领域数据集上进行。
Abstract
In this paper, we propose to use deep
policy networks
which are trained with an advantage actor-critic method for statistically optimised
dialogue systems
. First, we show that, on summary state and action spaces,
→