Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.

为解决深度强化学习代理在任务转移中的过度拟合问题和对现实环境的适应性差的问题，提出一种基于奖励机器的任务表示方法，使用抽象状态图与任务奖励动态的相互作用诱导子任务，从而实现在不同任务间的知识共享和过程优化的目标。经实验测试表明，该方法在各个领域中提高了样本效率和少量训练次数下的转移性能。

深度强化学习中奖励机制抽象的上下文预规划，以增强迁移能力