BriefGPT.xyz
Feb, 2023
AlphaZero 中的有针对性搜索控制以实现有效策略改进
Targeted Search Control in AlphaZero for Effective Policy Improvement
HTML
PDF
Alexandre Trudeau, Michael Bowling
TL;DR
使用Go-Exploit进行AlphaZero的搜索控制,从而提高样本效率和性能,并相对于KataGo等其他策略展示出了更加有效的搜索控制策略。
Abstract
alphazero
is a
self-play
reinforcement learning
algorithm that achieves superhuman play in chess, shogi, and Go via policy iteration. To b
→