Algorithmic reasoning is a fundamental cognitive ability that plays a pivotal role in problem-solving and decision-making processes. Reinforcement Learning (RL) has demonstrated remarkable proficiency in tasks such as motor control, handling perceptual input, and managing stochastic environments. These advancements have been enabled in part by the availability of benchmarks. In this work we introduce PUZZLES, a benchmark based on Simon Tatham's Portable Puzzle Collection, aimed at fostering progress in algorithmic and logical reasoning in RL. PUZZLES contains 40 diverse logic puzzles of adjustable sizes and varying levels of complexity; many puzzles also feature a diverse set of additional configuration parameters. The 40 puzzles provide detailed information on the strengths and generalization capabilities of RL agents. Furthermore, we evaluate various RL algorithms on PUZZLES, providing baseline comparisons and demonstrating the potential for future research. All the software, including the environment, is available at https://github.com/ETH-DISCO/rlp.

算法推理在问题解决和决策过程中起着重要作用，强化学习在运动控制、处理感知输入和管理随机环境等任务中表现出卓越的技能。本研究引入了PUZZLES，这是一个基于Simon Tatham的便携式拼图集的基准，旨在促进算法和逻辑推理在强化学习中的进展。PUZZLES包含40个不同大小和复杂程度的多样逻辑拼图，许多拼图还具有一组多样的配置参数。这40个拼图提供了关于强化学习代理的优势和泛化能力的详细信息。此外，我们在PUZZLES上评估了各种强化学习算法，提供了基准比较，并展示了未来研究的潜力。所有软件和环境都可在此https URL网址上获得。

PUZZLES：神经算法推理的基准