In increasingly different contexts, it happens that a human player has to
interact with artificial players who make decisions following decision-making
algorithms. How should the human player play against these algorithms to
maximize his utility? Does anything change if he faces one or more artificial
players? The main goal of the paper is to answer these two questions. Consider
n-player games in normal form repeated over time, where we call the human
player optimizer, and the (n -- 1) artificial players, learners. We assume that
learners play no-regret algorithms, a class of algorithms widely used in online
learning and decision-making. In these games, we consider the concept of
Stackelberg equilibrium. In a recent paper, Deng, Schneider, and Sivan have
shown that in a 2-player game the optimizer can always guarantee an expected
cumulative utility of at least the Stackelberg value per round. In our first
result, we show, with counterexamples, that this result is no longer true if
the optimizer has to face more than one player. Therefore, we generalize the
definition of Stackelberg equilibrium introducing the concept of correlated
Stackelberg equilibrium. Finally, in the main result, we prove that the
optimizer can guarantee at least the correlated Stackelberg value per round.
Moreover, using a version of the strong law of large numbers, we show that our
result is also true almost surely for the optimizer utility instead of the
optimizer's expected utility.

研究使用无遗憾算法在正态形式重复的 N 人博弈中，如何让人类玩家获得最大化效用，引入 Stackelberg 均衡和相关 Stackelberg 均衡的概念，证明玩家能够在每个回合至少保证相关 Stackelberg 期望值的效用。

对抗无悔玩家

Playing against no-regret players

Understanding the behavior of no-regret dynamics in general $N$-player games
is a fundamental question in online learning and game theory. A folk result in
the field states that, in finite games, the empirical frequency of play under
no-regret learning converges to the game's set of coarse correlated equilibria.
By contrast, our understanding of how the day-to-day behavior of the dynamics
correlates to the game's Nash equilibria is much more limited, and only partial
results are known for certain classes of games (such as zero-sum or congestion
games). In this paper, we study the dynamics of "follow-the-regularized-leader"
(FTRL), arguably the most well-studied class of no-regret dynamics, and we
establish a sweeping negative result showing that the notion of mixed Nash
equilibrium is antithetical to no-regret learning. Specifically, we show that
any Nash equilibrium which is not strict (in that every player has a unique
best response) cannot be stable and attracting under the dynamics of FTRL. This
result has significant implications for predicting the outcome of a learning
process as it shows unequivocally that only strict (and hence, pure) Nash
equilibria can emerge as stable limit points thereof.

本文研究了 no-regret 动力学中最常被考虑的动态系统之一 - Follow-the-regularized-leader 的行为，证明了非严格的纳什均衡对于 no-regret 学习是不稳定的且不能吸引该动态系统的稳定状态，因此只有严格的纳什均衡是 no-regret 动力学的稳定限制点。