利用影响正则化器避免负面副作用的挑战

Jan, 2021

利用影响正则化器避免负面副作用的挑战

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

David Lindner, Kyle Matoba, Alexander Meulemans

TL;DR本文研究在强化学习中，如何有效设计奖励函数以防止不良副作用，特别关注了已有研究提出的Impact Regularizer的四大挑战及其解决方法，并探讨了未解决的问题和未来改进的方向。

Abstract

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended →