Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

设计奖励函数很困难。为了解决这个问题，该论文提出了一种算法以自动生成一种辅助奖励函数来惩罚副作用。辅助奖励函数可以激励代理完成未来的任务，而且如果代理在当前任务中造成副作用，则该奖励会降低。为了避免代理干扰其他代理的不可逆操作以减少未来任务的完成度，该论文引入了一个基准策略，并使用它来过滤默认情况下无法完成的未来任务。该方法不仅避免了代理的干扰，而且对于避免副作用比惩罚不可逆操作更为有效。

通过考虑未来任务来避免副作用