Learning to cooperate with other agents is challenging when those agents also
possess the ability to adapt to our own behavior. Practical and theoretical
approaches to learning in cooperative settings typically assume that other
agents' behaviors are stationary, or else make very specific assumptions about
other agents' learning processes. The goal of this work is to understand
whether we can reliably learn to cooperate with other agents without such
restrictive assumptions, which are unlikely to hold in real-world applications.
Our main contribution is a set of impossibility results, which show that no
learning algorithm can reliably learn to cooperate with all possible adaptive
partners in a repeated matrix game, even if that partner is guaranteed to
cooperate with some stationary strategy. Motivated by these results, we then
discuss potential alternative assumptions which capture the idea that an
adaptive partner will only adapt rationally to our behavior.

本研究旨在了解在没有特定假设的情况下，我们是否能可靠地学会与其他具有自适应行为的智能体合作，并得出一组不可能性结果，表明即使智能体保证与某些固定策略合作，也没有学习算法可以可靠地学习如何与所有可能的自适应伙伴合作；随后讨论了捕捉自适应伙伴只会理性地适应我们行为的潜在替代假设。

关于在重复博弈中无法学习适应性合作策略的问题

On the Impossibility of Learning to Cooperate with Adaptive Partner Strategies in Repeated Games

We consider a scenario in which two reinforcement learning agents repeatedly
play a matrix game against each other and update their parameters after each
round. The agents' decision-making is transparent to each other, which allows
each agent to predict how their opponent will play against them. To prevent an
infinite regress of both agents recursively predicting each other indefinitely,
each agent is required to give an opponent-independent response with some
probability at least epsilon. Transparency also allows each agent to anticipate
and shape the other agent's gradient step, i.e. to move to regions of parameter
space in which the opponent's gradient points in a direction favourable to
them. We study the resulting dynamics experimentally, using two algorithms from
previous literature (LOLA and SOS) for opponent-aware learning. We find that
the combination of mutually transparent decision-making and opponent-aware
learning robustly leads to mutual cooperation in a single-shot prisoner's
dilemma. In a game of chicken, in which both agents try to manoeuvre their
opponent towards their preferred equilibrium, converging to a mutually
beneficial outcome turns out to be much harder, and opponent-aware learning can
even lead to worst-case outcomes for both agents. This highlights the need to
develop opponent-aware learning algorithms that achieve acceptable outcomes in
social dilemmas involving an equilibrium selection problem.

本文以两个强化学习代理经常在矩阵游戏中相互博弈作为情境，考虑透明性决策制定对于对手的预测及对手感知梯度步长能力，探究透明性决策制定与对手感知学习相结合能否在囚徒困境和鸡斗中取得可接受的收益等问题，发现透明性决策制定和对手感知学习的组合能对囚徒困境中的双方达成互惠合作。而在鸡斗场景中，由于平衡点的选择问题，需要进一步开发适合的对手感知学习算法。