We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call \emph{resource-constrained goal optimization} (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a \emph{shield} for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.

该研究考虑了部分可观察的马尔可夫决策过程（POMDP），并研究了一个问题，即如何在资源有限制的情况下实现目标最小化成本。该研究设计了一种算法用于计算特定情况下的“防护措施”，并将该防护措施与启发式搜索算法相结合，实现了对该问题的解决。通过实验证明了该算法的实用性。

资源受限的目标 POMDP 中的屏蔽