BriefGPT.xyz
May, 2022
驯服连续后验概率的潜在变分对话策略
Taming Continuous Posteriors for Latent Variational Dialogue Policies
HTML
PDF
Marin Vlastelica, Patrick Ernst, Gyuri Szarvas
TL;DR
本研究使用摊还变分推理方法结合高斯变分后验分布进行强化学习,同时对训练过程进行简化,并提出正则化方法以保持响应一致性,以此在Task-oriented Dialogue中取得了最好的对话成功率,并在MultiWOZ基准测试中表现出与分类潜在方法相当的结果。
Abstract
Utilizing
amortized variational inference
for latent-action
reinforcement learning
(RL) has been shown to be an effective approach in
task-orient
→