We explore colour versus shape goal misgeneralization originally demonstrated
by Di Langosco et al. (2022) in the Procgen Maze environment, where, given an
ambiguous choice, the agents seem to prefer generalization based on colour
rather than shape. After training over 1,000 agents in a simplified version of
the environment and evaluating them on over 10 million episodes, we conclude
that the behaviour can be attributed to the agents learning to detect the goal
object through a specific colour channel. This choice is arbitrary.
Additionally, we show how, due to underspecification, the preferences can
change when retraining the agents using exactly the same procedure except for
using a different random seed for the training run. Finally, we demonstrate the
existence of outliers in out-of-distribution behaviour based on training random
seed alone.

探讨了 Di Langosco 等人在 Procgen Maze 环境中最初展示出的颜色与形状目标错误泛化，即在一个模棱两可的选择中，代理人似乎更喜欢基于颜色而不是形状的泛化。训练了 1000 多个代理并在超过 1000 万个回合中对其进行评估后，我们得出结论，该行为可以归因于代理通过特定的颜色通道来学习检测目标物体，而这个选择是随意的。此外，我们展示了由于欠指定性，在除了使用不同的随机种子进行训练运行之外，重新训练代理会导致偏好的改变。最后，我们通过仅仅使用训练的随机种子，展示了在培训外的行为中存在离群值。

强化学习中的颜色对形状目标误泛化：一项案例研究

Colour versus Shape Goal Misgeneralization in Reinforcement Learning: A  Case Study

We study goal misgeneralization, a type of out-of-distribution generalization
failure in reinforcement learning (RL). Goal misgeneralization failures occur
when an RL agent retains its capabilities out-of-distribution yet pursues the
wrong goal. For instance, an agent might continue to competently avoid
obstacles, but navigate to the wrong place. In contrast, previous works have
typically focused on capability generalization failures, where an agent fails
to do anything sensible at test time. We formalize this distinction between
capability and goal generalization, provide the first empirical demonstrations
of goal misgeneralization, and present a partial characterization of its
causes.

本文研究强化学习中的一种广义泛化失败 —— 目标错误泛化。在此类失败中，强化学习代理在越出分布的情况下保留其能力，但追求错误的目标。我们阐明了能力和目标泛化之间的差别，提供了目标错误泛化的第一次经验演示，并对其原因进行了部分表征。