In this paper, we explore the susceptibility of the Q-learning algorithm (a classical and widely used reinforcement learning method) to strategic manipulation of sophisticated opponents in games. We quantify how much a strategically sophisticated agent can exploit a naive Q-learner if she knows the opponent's Q-learning algorithm. To this end, we formulate the strategic actor's problem as a Markov decision process (with a continuum state space encompassing all possible Q-values) as if the Q-learning algorithm is the underlying dynamical system. We also present a quantization-based approximation scheme to tackle the continuum state space and analyze its performance both analytically and numerically.

本文研究了Q-learning算法（一种经典且广泛应用于强化学习的方法）在游戏中受到复杂对手战略操纵的易感性，并量化了战略上熟练的代理人在了解对手的Q-learning算法的情况下可以如何利用一个天真的Q-learner。为达到这个目的，我们将战略角色的问题定义为一个马尔可夫决策过程（具有涵盖所有可能的Q值的连续状态空间），将Q-learning算法作为基础动态系统。我们还提出了一种基于量化的近似方案来处理连续状态空间，并从理论上和数值上分析了其性能。

针对 Q 学习者的战略化策略：控制理论方法