BriefGPT.xyz
Jan, 2023
SoftTreeMax: 通过树搜索实现策略梯度的指数级方差减少
SoftTreeMax: Exponential Variance Reduction in Policy Gradient via Tree Search
HTML
PDF
Gal Dalal, Assaf Hallak, Gugan Thoppe, Shie Mannor, Gal Chechik
TL;DR
本文提出一种新型的神经网络方案SoftTreeMax,通过树形计划从多个层面减轻策略梯度算法中的大方差、高样本复杂度问题,实现在Atari游戏中优异的性能表现。
Abstract
Despite the popularity of
policy gradient methods
, they are known to suffer from large variance and high sample complexity. To mitigate this, we introduce
softtreemax
-- a generalization of softmax that takes pla
→