BriefGPT.xyz
Jan, 2024
定时奇异-深度动态Q:对话策略学习的高效探索
Scheduled Curiosity-Deep Dyna-Q: Efficient Exploration for Dialog Policy Learning
HTML
PDF
Xuecheng Niu, Akinori Ito, Takashi Nose
TL;DR
基于Deep Dyna-Q (DDQ)模型的好奇心驱动的课程学习框架,通过计划学习和好奇心的引入,在任务导向的对话代理培训过程中获得显著改进,并发现了易先与难先策略更适合SC-DDQ和DDQ。
Abstract
Training
task-oriented dialog agents
based on
reinforcement learning
is time-consuming and requires a large number of interactions with real users. How to grasp dialog policy within limited dialog experiences rem
→