BriefGPT.xyz
Jun, 2020
使用自我博弈的近乎最优强化学习
Near-Optimal Reinforcement Learning with Self-Play
HTML
PDF
Yu Bai, Chi Jin, Tiancheng Yu
TL;DR
本文提出了楽观的Nash Q-learning算法,并使用了新的Nash V-learning算法,解决了在马尔可夫博弈环境中的奖励学习优化问题,且这个算法的采样复杂度比现有算法还要低.
Abstract
This paper considers the problem of designing optimal algorithms for
reinforcement learning
in two-player zero-sum games. We focus on
self-play algorithms
which learn the optimal policy by playing against itself
→