BriefGPT.xyz
May, 2023
自学对话系统中缺陷行为的可扩展和安全修复
Scalable and Safe Remediation of Defective Actions in Self-Learning Conversational Systems
HTML
PDF
Sarthak Ahuja, Mohammad Kachuee, Fateme Sheikholeslami, Weiqing Liu, Jaeyoung Do
TL;DR
本文提出了一种基于历史回归事故报告的高精度数据样本的培育和利用方法,以在在线部署之前验证、保护并改进政策,解决Off-Policy强化学习在大规模商业设置中难以平衡政策改进和经验连续性的问题,并提高了对话系统的用户满意度。
Abstract
off-policy reinforcement learning
has been a driving force for the state-of-the-art
conversational ais
leading to more natural humanagent interactions and improving the user satisfaction for goal-oriented agents.
→