We propose a novel Deep Reinforcement Learning (DRL) architecture for sequential decision processes under uncertainty, as encountered in inspection and maintenance (I&M) planning. Unlike other DRL algorithms for (I&M) planning, the proposed +RQN architecture dispenses with computing the belief state and directly handles erroneous observations instead. We apply the algorithm to a basic I&M planning problem for a one-component system subject to deterioration. In addition, we investigate the performance of Monte Carlo tree search for the I&M problem and compare it to the +RQN. The comparison includes a statistical analysis of the two methods' resulting policies, as well as their visualization in the belief space.

我们提出了一种新颖的深度强化学习（DRL）体系架构，用于处理不确定性情况下的顺序决策过程，如检查与维护计划。与其他针对检查与维护计划的DRL算法不同，所提出的+RQN架构不计算信念状态，而是直接处理错误的观测。我们将该算法应用于一个基本的受损系统的检查与维护计划问题。此外，我们研究了Monte Carlo树搜索在检查与维护问题中的性能，并将其与+RQN进行比较。比较包括对两种方法生成的策略进行统计分析，以及它们在信念空间中的可视化。

信仰自由的深度强化学习和蒙特卡罗树搜索在检验与维护规划中的研究