Regularized reinforcement learning (RL), particularly the entropy-regularized
kind, has gained traction in optimal control and inverse RL. While standard
unregularized RL methods remain unaffected by changes in the number of actions,
we show that it can severely impact their regularized counterparts. This paper
demonstrates the importance of decoupling the regularizer from the action
space: that is, to maintain a consistent level of regularization regardless of
how many actions are involved to avoid over-regularization. Whereas the problem
can be avoided by introducing a task-specific temperature parameter, it is
often undesirable and cannot solve the problem when action spaces are
state-dependent. In the state-dependent action context, different states with
varying action spaces are regularized inconsistently. We introduce two
solutions: a static temperature selection approach and a dynamic counterpart,
universally applicable where this problem arises. Implementing these changes
improves performance on the DeepMind control suite in static and dynamic
temperature regimes and a biological sequence design task.

研究论文的主要内容是关于正则化强化学习，尤其是熵正则化的方法在最优控制和逆强化学习方面的应用。论文指出，改变动作数量对于标准非正则化强化学习方法没有影响，但会严重影响正则化的方法。为了避免过度正则化，需要解耦作用空间与正则项，并提出两种解决方案，即静态温度选择方法和动态对应方法，能在出现该问题的情况下普遍适用。实验结果表明，这些改变提高了在静态和动态温度条件下的 DeepMind 控制套件和生物序列设计任务的性能。