BriefGPT.xyz
Aug, 2019
引导式对话策略学习:针对多领域任务导向型对话的奖励估计
Guided Dialog Policy Learning: Reward Estimation for Multi-Domain Task-Oriented Dialog
HTML
PDF
Ryuichi Takanobu, Hanlin Zhu, Minlie Huang
TL;DR
该研究提出了一种基于对抗逆强化学习的引导式对话策略学习算法,该算法可以在多领域任务导向对话中进行奖励估计和策略优化,以实现有效的对话,并在多领域对话数据集上进行广泛实验。
Abstract
dialog policy
decides what and how a
task-oriented dialog
system will respond, and plays a vital role in delivering effective conversations. Many studies apply
→