BriefGPT.xyz
Jan, 2021
利用影响正则化器避免负面副作用的挑战
Challenges for Using Impact Regularizers to Avoid Negative Side Effects
HTML
PDF
David Lindner, Kyle Matoba, Alexander Meulemans
TL;DR
本文研究在强化学习中,如何有效设计奖励函数以防止不良副作用,特别关注了已有研究提出的Impact Regularizer的四大挑战及其解决方法,并探讨了未解决的问题和未来改进的方向。
Abstract
Designing reward functions for
reinforcement learning
is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended
→