We consider the problem of computing optimal generalised policies for
relational Markov decision processes. We describe an approach combining some of
the benefits of purely inductive techniques with those of symbolic dynamic
programming methods. The latter reason about the optimal value function using
first-order decision theoretic regression and formula rewriting, while the
former, when provided with a suitable hypotheses language, are capable of
generalising value functions or policies for small instances. Our idea is to
use reasoning and in particular classical first-order regression to
automatically generate a hypotheses language dedicated to the domain at hand,
which is then used as input by an inductive solver. This approach avoids the
more complex reasoning of symbolic dynamic programming while focusing the
inductive solver's attention on concepts that are specifically relevant to the
optimal value function for the domain considered.

我们研究了基于关系型马尔可夫决策过程的最优泛化策略计算问题，提出了一种结合归纳技术和符号动态规划方法的方法，以自动生成与问题领域相关的假设语言作为归纳求解器的输入来回避复杂的符号动态规划推理。

利用一阶回归进行归纳策略选择

Exploiting First-Order Regression in Inductive Policy Selection

Markov decision processes capture sequential decision making under
uncertainty, where an agent must choose actions so as to optimize long term
reward. The paper studies efficient reasoning mechanisms for Relational Markov
Decision Processes (RMDP) where world states have an internal relational
structure that can be naturally described in terms of objects and relations
among them. Two contributions are presented. First, the paper develops First
Order Decision Diagrams (FODD), a new compact representation for functions over
relational structures, together with a set of operators to combine FODDs, and
novel reduction techniques to keep the representation small. Second, the paper
shows how FODDs can be used to develop solutions for RMDPs, where reasoning is
performed at the abstract level and the resulting optimal policy is independent
of domain size (number of objects) or instantiation. In particular, a variant
of the value iteration algorithm is developed by using special operations over
FODDs, and the algorithm is shown to converge to the optimal policy.

研究证明，使用新的紧凑表示 ——FODD，可以解决 RMDPs，通过 FODDs 操作开发价值迭代算法，并证明该算法完全收敛且具有独立于领域大小或实例化的最佳策略。