BriefGPT.xyz
Oct, 2023
具有核求积的策略梯度算法
Policy Gradient with Kernel Quadrature
HTML
PDF
Satoshi Hayakawa, Tetsuro Morimura
TL;DR
通过高斯过程建模,我们选择了一个能够有效计算奖励的样本集,并使用“时序”核积分方法压缩样本信息后,将样本集传递给策略网络进行梯度更新,以提高强化学习中奖励评估的效率。
Abstract
reward evaluation
of episodes becomes a bottleneck in a broad range of
reinforcement learning
tasks. Our aim in this paper is to select a small but representative subset of a large batch of episodes, only on whic
→