Efficiently learning equilibria with large state and action spaces in general-sum Markov games while overcoming the curse of multi-agency is a challenging problem. Recent works have attempted to solve this problem by employing independent linear function classes to approximate the marginal $Q$-value for each agent. However, existing sample complexity bounds under such a framework have a suboptimal dependency on the desired accuracy $\varepsilon$ or the action space. In this work, we introduce a new algorithm, Lin-Confident-FTRL, for learning coarse correlated equilibria (CCE) with local access to the simulator, i.e., one can interact with the underlying environment on the visited states. Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $\epsilon$-CCE with a provable optimal accuracy bound $O(\epsilon^{-2})$ and gets rids of the linear dependency on the action space, while scaling polynomially with relevant problem parameters (such as the number of agents and time horizon). Moreover, our analysis of Linear-Confident-FTRL generalizes the virtual policy iteration technique in the single-agent local planning literature, which yields a new computationally efficient algorithm with a tighter sample complexity bound when assuming random access to the simulator.

学习大状态和动作空间中的均衡、克服多项机构所带来的麻烦是一个具有挑战性的问题，最近的研究尝试通过使用独立的线性函数类来逼近每个代理的边际Q值来解决这个问题。我们介绍了一种新算法Lin-Confident-FTRL，用于学习具有本地对模拟器访问能力的粗粒度相关均衡（CCE），并具有证明最优准确性界限O（ϵ^-2）的可扩展性和抛弃了对动作空间的线性依赖。此外，我们对Linear-Confident-FTRL的分析广泛地推广了单机器人局部规划文献中的虚拟策略迭代技术，从而在假设对模拟器具有随机访问权时得到了一个新的计算有效的算法，并获得了更紧凑的样本复杂度界限。

独立功能逼近的强化学习与马尔可夫博弈：在局部访问模型下改进的样本复杂度界限