Classical multi-agent reinforcement learning (MARL) assumes risk neutrality
and complete objectivity for agents. However, in settings where agents need to
consider or model human economic or social preferences, a notion of risk must
be incorporated into the RL optimization problem. This will be of greater
importance in MARL where other human or non-human agents are involved, possibly
with their own risk-sensitive policies. In this work, we consider
risk-sensitive and non-cooperative MARL with cumulative prospect theory (CPT),
a non-convex risk measure and a generalization of coherent measures of risk.
CPT is capable of explaining loss aversion in humans and their tendency to
overestimate/underestimate small/large probabilities. We propose a distributed
sampling-based actor-critic (AC) algorithm with CPT risk for network
aggregative Markov games (NAMGs), which we call Distributed Nested CPT-AC.
Under a set of assumptions, we prove the convergence of the algorithm to a
subjective notion of Markov perfect Nash equilibrium in NAMGs. The experimental
results show that subjective CPT policies obtained by our algorithm can be
different from the risk-neutral ones, and agents with a higher loss aversion
are more inclined to socially isolate themselves in an NAMG.

使用累积概率理论（CPT）的分布式采样型 actor-critic（AC）算法为网络聚合式马尔科夫博弈（NAMG）引入风险敏感性，实现主观感知的马尔科夫最优纳什均衡。实验结果表明，通过我们的算法获得的主观的 CPT 策略可能与风险中性策略不同，具有更高的损失规避倾向的智能体在 NAMG 中更倾向于社会隔离。

网络聚合马尔可夫博弈中的风险敏感多智能体强化学习

Risk-Sensitive Multi-Agent Reinforcement Learning in Network Aggregative  Markov Games

We study the game modification problem, where a benevolent game designer or a
malevolent adversary modifies the reward function of a zero-sum Markov game so
that a target deterministic or stochastic policy profile becomes the unique
Markov perfect Nash equilibrium and has a value within a target range, in a way
that minimizes the modification cost. We characterize the set of policy
profiles that can be installed as the unique equilibrium of some game, and
establish sufficient and necessary conditions for successful installation. We
propose an efficient algorithm, which solves a convex optimization problem with
linear constraints and then performs random perturbation, to obtain a
modification plan with a near-optimal cost.

研究了游戏修改问题，其中一个仁慈的游戏设计者或恶意对手修改了零和马尔可夫博弈的回报函数，使得目标确定性或随机策略配置成为唯一的马尔可夫完美纳什均衡，并且其价值在目标范围内，以最小化修改成本。我们表征了可以作为某个游戏唯一均衡的策略配置集合，并通过建立充分和必要条件来确定成功安装的可能性。我们提出了一种高效的算法，通过求解一个带有线性约束的凸优化问题，然后进行随机扰动，以获得具有近似最优成本的修改计划。