Efficiently learning equilibria with large state and action spaces in
general-sum Markov games while overcoming the curse of multi-agency is a
challenging problem. Recent works have attempted to solve this problem by
employing independent linear function classes to approximate the marginal
$Q$-value for each agent. However, existing sample complexity bounds under such
a framework have a suboptimal dependency on the desired accuracy $\varepsilon$
or the action space. In this work, we introduce a new algorithm,
Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local
access to the simulator, i.e., one can interact with the underlying environment
on the visited states. Up to a logarithmic dependence on the size of the state
space, Lin-Confident-FTRL learns $\epsilon$-CCE with a provable optimal
accuracy bound $O(\epsilon^{-2})$ and gets rids of the linear dependency on the
action space, while scaling polynomially with relevant problem parameters (such
as the number of agents and time horizon). Moreover, our analysis of
Linear-Confident-FTRL generalizes the virtual policy iteration technique in the
single-agent local planning literature, which yields a new computationally
efficient algorithm with a tighter sample complexity bound when assuming random
access to the simulator.

学习大状态和动作空间中的均衡、克服多项机构所带来的麻烦是一个具有挑战性的问题，最近的研究尝试通过使用独立的线性函数类来逼近每个代理的边际 Q 值来解决这个问题。我们介绍了一种新算法 Lin-Confident-FTRL，用于学习具有本地对模拟器访问能力的粗粒度相关均衡（CCE），并具有证明最优准确性界限 O（ϵ^-2）的可扩展性和抛弃了对动作空间的线性依赖。此外，我们对 Linear-Confident-FTRL 的分析广泛地推广了单机器人局部规划文献中的虚拟策略迭代技术，从而在假设对模拟器具有随机访问权时得到了一个新的计算有效的算法，并获得了更紧凑的样本复杂度界限。