We select policies for large Markov Decision Processes (MDPs) with compact first-order representations. We find policies that generalize well as the number of objects in the domain grows, potentially without bound. Existing dynamic-programming approaches based on flat, propositional, or first-order representations either are impractical here or do not naturally scale as the number of objects grows without bound. We implement and evaluate an alternative approach that induces first-order policies using training data constructed by solving small problem instances using PGraphplan (Blum & Langford, 1999). Our policies are represented as ensembles of decision lists, using a taxonomic concept language. This approach extends the work of Martin and Geffner (2000) to stochastic domains, ensemble learning, and a wider variety of problems. Empirically, we find "good" policies for several stochastic first-order MDPs that are beyond the scope of previous approaches. We also discuss the application of this work to the relational reinforcement-learning problem.

通过使用第一阶表示中的紧凑策略来选择大型马尔可夫决策过程的策略，我们通过训练数据来诱导第一阶策略，使用具有分类概念语言的决策列表的合集来表示我们的策略，我们发现此方法在概率领域中具有优越的效果，并讨论了此方法在关系加固学习问题上的应用。

基于归纳的一阶MDP策略选择