Developing agents that can leverage planning abilities during their decision
and learning processes is critical to the advancement of Artificial
Intelligence. Recent works have demonstrated the effectiveness of combining
tree-based search methods and self-play learning mechanisms. Yet, these methods
typically face scaling challenges due to the sequential nature of their search.
While practical engineering solutions can partly overcome this, they still
demand extensive computational resources, which hinders their applicability. In
this paper, we introduce SMX, a model-based planning algorithm that utilises
scalable Sequential Monte Carlo methods to create an effective self-learning
mechanism. Grounded in the theoretical framework of control as inference, SMX
benefits from robust theoretical underpinnings. Its sampling-based search
approach makes it adaptable to environments with both discrete and continuous
action spaces. Furthermore, SMX allows for high parallelisation and can run on
hardware accelerators to optimise computing efficiency. SMX demonstrates a
statistically significant improvement in performance compared to AlphaZero, as
well as demonstrating its performance as an improvement operator for a
model-free policy, matching or exceeding top model-free methods across both
continuous and discrete environments.

SMX 是一个基于模型的规划算法，利用可扩展的序贯蒙特卡洛方法创建了一个有效的自学习机制，通过采样的搜索方法使其适用于离散和连续动作空间的环境，并且具有高度的并行化和计算效率优化的能力。与 AlphaZero 相比，SMX 在性能上有显著的提升，并且在连续和离散环境中与顶尖无模型方法相匹配或超越。