Unambiguous identification of the rewards driving behaviours of entities
operating in complex open-ended real-world environments is difficult, partly
because goals and associated behaviours emerge endogenously and are dynamically
updated as environments change. Reproducing such dynamics in models would be
useful in many domains, particularly where fixed reward functions limit the
adaptive capabilities of agents. Simulation experiments described assess a
candidate algorithm for the dynamic updating of rewards, RULE: Reward Updating
through Learning and Expectation. The approach is tested in a simplified
ecosystem-like setting where experiments challenge entities' survival, calling
for significant behavioural change. The population of entities successfully
demonstrate the abandonment of an initially rewarded but ultimately detrimental
behaviour, amplification of beneficial behaviour, and appropriate responses to
novel items added to their environment. These adjustment happen through
endogenous modification of the entities' underlying reward function, during
continuous learning, without external intervention.

在复杂的现实环境中，准确识别驱动实体行为的奖励是困难的，特别在环境改变时，由于目标和相关行为在内生地出现并动态更新。本文通过学习和期望的方式考察了一种用于动态更新奖励的候选算法 RULE。通过在简化的生态系统模拟实验中测试，该方法成功模拟了实体的行为调整，包括放弃最初有奖励但最终有害的行为、增强有益的行为，以及对环境中新物品的恰当反应。这些调整是通过持续学习中实体自身奖励函数的内生修改而发生的，无需外部干预。

开放环境中的持续演化奖励

Continuously evolving rewards in an open-ended environment

Much research in artificial intelligence is concerned with the development of
autonomous agents that can interact effectively with other agents. An important
aspect of such agents is the ability to reason about the behaviours of other
agents, by constructing models which make predictions about various properties
of interest (such as actions, goals, beliefs) of the modelled agents. A variety
of modelling approaches now exist which vary widely in their methodology and
underlying assumptions, catering to the needs of the different sub-communities
within which they were developed and reflecting the different practical uses
for which they are intended. The purpose of the present article is to provide a
comprehensive survey of the salient modelling methods which can be found in the
literature. The article concludes with a discussion of open problems which may
form the basis for fruitful future research.

介绍人工智能领域中对于开发具有自主性并与其它实体有效交互的代理人的方法，重点讨论了不同建模方法及其基础方法学和前提条件，涉及方法学和实际应用，最后阐述了未来研究的潜在课题。