Nash Q-learning may be considered one of the first and most known algorithms
in multi-agent reinforcement learning (MARL) for learning policies that
constitute a Nash equilibrium of an underlying general-sum Markov game. Its
original proof provided asymptotic guarantees and was for the tabular case.
Recently, finite-sample guarantees have been provided using more modern RL
techniques for the tabular case. Our work analyzes Nash Q-learning using linear
function approximation -- a representation regime introduced when the state
space is large or continuous -- and provides finite-sample guarantees that
indicate its sample efficiency. We find that the obtained performance nearly
matches an existing efficient result for single-agent RL under the same
representation and has a polynomial gap when compared to the best-known result
for the tabular case.

本研究分析使用线性函数近似的 Nash Q-learning 在多智能体强化学习中学习构成 Nash 均衡的策略，并提供有限样本保证，表明其样本效率。研究发现，该方法的性能与单智能体强化学习相当，且比表格化算法的最佳结果差一个多项式差距。