We propose a new algorithm for model-based distributional reinforcement
learning (RL), and prove that it is minimax-optimal for approximating return
distributions with a generative model (up to logarithmic factors), resolving an
open question of Zhang et al. (2023). Our analysis provides new theoretical
results on categorical approaches to distributional RL, and also introduces a
new distributional Bellman equation, the stochastic categorical CDF Bellman
equation, which we expect to be of independent interest. We also provide an
experimental study comparing several model-based distributional RL algorithms,
with several takeaways for practitioners.

我们提出了一种新的算法，用于基于模型的分销式强化学习，经证明在逼近具有生成模型的回报分布方面是极小极大最优的（在对数因子上），解决了张等人（2023 年）的一个悬而未决的问题。我们的分析为分销式强化学习的类别方法提供了新的理论结果，并引入了一种新的分销式贝尔曼方程 —— 随机类别 CDF 贝尔曼方程，我们认为它具有独立的重要性。我们还提供了一个实验研究，比较了几种基于模型的分销式强化学习算法，其中对于实践者们有一些要点。