BriefGPT.xyz
Dec, 2016
强化学习中的另一种Softmax算子
A New Softmax Operator for Reinforcement Learning
HTML
PDF
Kavosh Asadi, Michael L. Littman
TL;DR
研究发现Boltzmann softmax运算符在顺序决策制定中容易出现异常,在此基础上,提出了一种可微分的softmax运算符,并引入了一种基于新算子的SARSA算法,计算出具有状态相关温度参数的Boltzmann策略,该算法具有收敛性和实用性。
Abstract
A
softmax operator
applied to a set of values acts somewhat like the maximization function and somewhat like an average. In
sequential decision making
, softmax is often used in settings where it is necessary to m
→