Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many
domains such as Go and Atari games when combining with deep neural networks
(DNNs). When more simulations are executed, MCTS can achieve higher performance
but also requires enormous amounts of CPU and GPU resources. However, not all
states require a long searching time to identify the best action that the agent
can find. For example, in 19x19 Go and NoGo, we found that for more than half
of the states, the best action predicted by DNN remains unchanged even after
searching 2 minutes. This implies that a significant amount of resources can be
saved if we are able to stop the searching earlier when we are confident with
the current searching result. In this paper, we propose to achieve this goal by
predicting the uncertainty of the current searching status and use the result
to decide whether we should stop searching. With our algorithm, called Dynamic
Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero
2.5 times faster while maintaining a similar winning rate. Also, under the same
average simulation count, our method can achieve a 61% winning rate against the
original program.

本文提出一种名为 Dynamic Simulation MCTS 的算法，它通过预测当前状态的不确定性来决定是否停止搜索，实现了在不降低胜率的情况下，将 NoGo 智能体训练速度提高了 2.5 倍，并在同样的平均模拟次数下，取得了 61% 的胜率。