BriefGPT.xyz
Jan, 2023
使用Tsallis KL散度的广义Munchausen强化学习
Generalized Munchausen Reinforcement Learning using Tsallis KL Divergence
HTML
PDF
Lingwei Zhu, Zheng Chen, Takamitsu Matsubara, Martha White
TL;DR
该研究探讨了一种广义的KL散度,称为Tsallis KL散度,并将其应用于政策优化,通过将其与基于MVI的KL正则化相结合,证明该技术可有效提高35个Atari游戏的表现。
Abstract
Many
policy optimization
approaches in
reinforcement learning
incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially propo
→