We present counting reward automata-a finite state machine variant capable of modelling any reward function expressible as a formal language. Unlike previous approaches, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by unrestricted grammars. We prove that an agent equipped with such an abstract machine is able to solve a larger set of tasks than those utilising current approaches. We show that this increase in expressive power does not come at the cost of increased automaton complexity. A selection of learning algorithms are presented which exploit automaton structure to improve sample efficiency. We show that the state machines required in our formulation can be specified from natural language task descriptions using large language models. Empirical results demonstrate that our method outperforms competing approaches in terms of sample efficiency, automaton complexity, and task completion.

我们提出了一种计数奖励自动机——一种能够模拟任何能以形式语言表示的奖励函数的有限状态机变体。与以前的方法不同，这些方法仅能表达任务为正则语言，而我们的框架允许通过无限制语法来描述任务。我们证明了一个配备这样抽象机器的代理能够解决比使用当前方法更多的任务。我们展示了这种表达能力的增加并不需要增加自动机的复杂性。我们提出了一系列利用自动机结构来提高样本效率的学习算法。我们展示了我们的方法在样本效率、自动机复杂性和任务完成方面优于竞争方法的实证结果。

计数奖励自动机: 通过利用奖励函数结构的样本高效强化学习