Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

本论文研究奖励作为捕捉代理应执行任务的方式的表达能力，并针对可期望的三种新的任务抽象概念（接受的行为集、行为的偏序关系或轨迹的偏序关系），提供了一组构造马尔可夫奖励函数的多项式时间算法，使代理能够优化每种类型的任务，并正确判断不存在这种奖励函数的情况。最后通过实证研究证实了理论发现。

关于马尔可夫奖励的表现力