TL;DR本文提出使用基于人口的训练(PBT)方法动态调整超参数并在训练过程中提高模型性能,证明该方法在9x9 Go上获得了更高的胜率,在19x19 Go上相比于AlphaZero的饱和版本获得了更高的胜率(74% vs 47%)
Abstract
alphazero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. →