In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically explore model uncertainty measures for selective planning and show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates. To that end, we propose and evaluate bounding-box inference, which operates on bounding-boxes around sets of possible states and other quantities. We find that bounding-box inference can reliably support effective selective planning.

基于模型的强化学习中，模拟经验往往被视为与真实环境的经验等价。然而，当模型不准确时，它可能对策略学习造成灾难性干扰。相反，智能体可以学习模型的准确性，并仅在可以提供可靠预测时有选择地使用它。我们通过实证研究探讨了模型不确定性测量与选择性规划，并展示了最佳结果需要分布不敏感推理来估计基于模型的更新的不确定性。为此，我们提出并评估了一种基于边界框的推理方法，它在可能状态和其他量的边界框上进行操作。我们发现，基于边界框的推理能够可靠地支持有效的选择性规划。

基于模型的强化学习的误差感知边界框推断