BriefGPT.xyz
May, 2022
公正博弈:强化学习的挑战
Impartial Games: A Challenge for Reinforcement Learning
HTML
PDF
Bei Zhou, Søren Riis
TL;DR
本文介绍了 AlphaZero 和 MuZero 的算法,探究了它们的局限性,并提供了新的瓶颈测试方法以解决 AlphaZero 在某些博弈游戏中学习能力不足的问题,并发现 AlphaZero 在解决 nim 游戏时会面临严重的问题。
Abstract
The
alphazero
algorithm and its successor
muzero
have revolutionised several competitive strategy games, including chess, Go, and shogi and video games like Atari, by learning to play these games better than any
→