强化学习中的另一种Softmax算子

Dec, 2016

A New Softmax Operator for Reinforcement Learning

Kavosh Asadi, Michael L. Littman

TL;DR研究发现Boltzmann softmax运算符在顺序决策制定中容易出现异常，在此基础上，提出了一种可微分的softmax运算符，并引入了一种基于新算子的SARSA算法，计算出具有状态相关温度参数的Boltzmann策略，该算法具有收敛性和实用性。

Abstract

A softmax operator applied to a set of values acts somewhat like the maximization function and somewhat like an average. In sequential decision making, softmax is often used in settings where it is necessary to m