A major challenge in decision making domains with large state spaces is to effectively select actions which maximize utility. In recent years, approaches such as reinforcement learning (RL) and search algorithms have been successful to tackle this issue, despite their differences. RL defines a learning framework that an agent explores and interacts with. Search algorithms provide a formalism to search for a solution. However, it is often difficult to evaluate the performances of such approaches in a practical way. Motivated by this problem, we focus on one game domain, i.e., Connect-4, and develop a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS). The contribution of this paper is threefold: i) we implement advanced versions of these algorithms and provide a systematic comparison with their standard counterpart, ii) we develop a novel evaluation framework, which we call the Evolutionary Tournament, and iii) we conduct an extensive evaluation of the relative performance of each algorithm to compare our findings. We evaluate different metrics and show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively, although the latter is shown to be the fastest to make a decision.

在大状态空间的决策领域中，有效选择最大效用的行动是一个主要挑战。本文针对一个游戏领域——Connect-4，开发了一种新颖的进化框架来评估三类算法：强化学习（RL）、极大极小算法（Minimax）和蒙特卡洛树搜索（MCTS）。研究发现，MCTS在胜率方面取得了最好的结果，而Minimax和Q-Learning分别排名第二和第三，尽管后者在决策速度方面表现最快。

演化算法评估Connect-4中高级Minimax、Q-Learning和MCTS的比较框架