BriefGPT.xyz
Apr, 2020
基于角色感知奖励分解的多智能体面向任务的对话策略学习
Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition
HTML
PDF
Ryuichi Takanobu, Runze Liang, Minlie Huang
TL;DR
本文介绍了一种采用多智能体对话策略学习的方法,用于同时训练系统和用户策略,并通过角色感知奖励分解和行为者-评论家框架提高预训练和可扩展性。结果表明,该方法能够通过对话交互,使两个智能体成功完成任务。
Abstract
Many studies have applied
reinforcement learning
to train a
dialog policy
and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experienc
→