In this paper, we explore the susceptibility of the Q-learning algorithm (a
classical and widely used reinforcement learning method) to strategic
manipulation of sophisticated opponents in games. We quantify how much a
strategically sophisticated agent can exploit a naive Q-learner if she knows
the opponent's Q-learning algorithm. To this end, we formulate the strategic
actor's problem as a Markov decision process (with a continuum state space
encompassing all possible Q-values) as if the Q-learning algorithm is the
underlying dynamical system. We also present a quantization-based approximation
scheme to tackle the continuum state space and analyze its performance both
analytically and numerically.

本文研究了 Q-learning 算法（一种经典且广泛应用于强化学习的方法）在游戏中受到复杂对手战略操纵的易感性，并量化了战略上熟练的代理人在了解对手的 Q-learning 算法的情况下可以如何利用一个天真的 Q-learner。为达到这个目的，我们将战略角色的问题定义为一个马尔可夫决策过程（具有涵盖所有可能的 Q 值的连续状态空间），将 Q-learning 算法作为基础动态系统。我们还提出了一种基于量化的近似方案来处理连续状态空间，并从理论上和数值上分析了其性能。