The reward system is one of the fundamental drivers of animal behaviors and
is critical for survival and reproduction. Despite its importance, the problem
of how the reward system has evolved is underexplored. In this paper, we try to
replicate the evolution of biologically plausible reward functions and
investigate how environmental conditions affect evolved rewards' shape. For
this purpose, we developed a population-based decentralized evolutionary
simulation framework, where agents maintain their energy level to live longer
and produce more children. Each agent inherits its reward function from its
parent subject to mutation and learns to get rewards via reinforcement learning
throughout its lifetime. Our results show that biologically reasonable positive
rewards for food acquisition and negative rewards for motor action can evolve
from randomly initialized ones. However, we also find that the rewards for
motor action diverge into two modes: largely positive and slightly negative.
The emergence of positive motor action rewards is surprising because it can
make agents too active and inefficient in foraging. In environments with poor
and poisonous foods, the evolution of rewards for less important foods tends to
be unstable, while rewards for normal foods are still stable. These results
demonstrate the usefulness of our simulation environment and energy-dependent
birth and death model for further studies of the origin of reward systems.

通过模拟进化奖励功能并研究环境条件如何影响进化奖励函数的形状，本研究发现奖励系统中存在对食物获取的正向奖励和对运动行为的负向奖励，但运动行为的奖励出现了两种模式：相当正向和稍微负向，并指出在贫瘠和有毒食物环境下，奖励为次要食物的进化不稳定，而对正常食物的奖励仍然稳定，这些结果证明了模拟环境和能量依赖的出生和死亡模型在奖励系统起源研究中的实用性。

通过模拟出生和死亡来演化食物和动作的奖励

Evolution of Rewards for Food and Motor Action by Simulating Birth and  Death

In the future, artificial learning agents are likely to become increasingly
widespread in our society. They will interact with both other learning agents
and humans in a variety of complex settings including social dilemmas. We argue
that there is a need for research on the intersection between game theory and
artificial intelligence, with the goal of achieving cooperative artificial
intelligence that can navigate social dilemmas well. We consider the problem of
how an external agent can promote cooperation between artificial learners by
distributing additional rewards and punishments based on observing the actions
of the learners. We propose a rule for automatically learning how to create the
right incentives by considering the anticipated parameter updates of each
agent. Using this learning rule leads to cooperation with high social welfare
in matrix games in which the agents would otherwise learn to defect with high
probability. We show that the resulting cooperative outcome is stable in
certain games even if the planning agent is turned off after a given number of
episodes, while other games require ongoing intervention to maintain mutual
cooperation. Finally, we reflect on what the goals of multi-agent reinforcement
learning should be in the first place, and discuss the necessary building
blocks towards the goal of building cooperative AI.

本研究探讨了人工智能和博弈论的交叉领域，通过设计自动学习规则和设置奖惩机制来实现良好的社会合作，致力于构建一个合作型人工智能的研究目标。