BriefGPT.xyz
Apr, 2021
不完美也值得奖励: 面向更好对话管理的多层次和序列奖励建模
Imperfect also Deserves Reward: Multi-Level and Sequential Reward Modeling for Better Dialog Management
HTML
PDF
Zhengxu Hou, Bang Liu, Ruihui Zhao, Zijing Ou, Yafei Liu...
TL;DR
本文提出了一种多层次奖励建模方法,以分解整个奖励信号并提高对话系统在强化学习方面的性能,实验结果表明本方法能提高对话系统的性能和收敛速度。
Abstract
For task-oriented dialog systems, training a
reinforcement learning
(RL) based
dialog management
module suffers from low sample efficiency and slow convergence speed due to the sparse rewards in RL.To solve this
→