The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent's unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate positive transfer across agents. Additionally, we examine the planner's generalisation ability to unseen agents and compare our method to traditional imitation learning approaches. By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to $42.20\%$ in task completion accuracy compared to a policy trained on a dataset from a single agent.

本研究解决了开发通用智能体以处理多种顺序决策任务的挑战。我们提出了一种基于扩散的规划者和逆动力学模型的两阶段通用策略框架，允许智能体在共享观察空间的情况下，适应各自不同的行动空间。我们的实验结果表明，使用来自不同智能体的联合数据集进行训练，可以显著提高任务完成精度，最高改善幅度达42.20%。

使通用策略普适化