In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of rewards, and absence of system robustness after task composition. To surmount these challenges, we view task composition through the prism of category theory -- a mathematical discipline exploring structures and their compositional relationships. The categorical properties of Markov decision processes untangle complex tasks into manageable sub-tasks, allowing for strategical reduction of dimensionality, facilitating more tractable reward structures, and bolstering system robustness. Experimental results support the categorical theory of reinforcement learning by enabling skill reduction, reuse, and recycling when learning complex robotic arm tasks.

本研究针对强化学习中任务组合的难题，提出了一种通过类别理论来解决任务高维度、奖励稀缺和系统脆弱性等挑战的新方法。研究表明，使用马尔可夫决策过程的类属特性，可以有效将复杂任务分解为可管理的子任务，提高系统的鲁棒性，并实现技能的减少、重用和回收，从而推动复杂机器人任务的学习。

减少、重用、回收：组合强化学习的类别