We introduce AMAGO, an in-context Reinforcement Learning (RL) agent that uses sequence models to tackle the challenges of generalization, long-term memory, and meta-learning. Recent works have shown that off-policy learning can make in-context RL with recurrent policies viable. Nonetheless, these approaches require extensive tuning and limit scalability by creating key bottlenecks in agents' memory capacity, planning horizon, and model size. AMAGO revisits and redesigns the off-policy in-context approach to successfully train long-sequence Transformers over entire rollouts in parallel with end-to-end RL. Our agent is uniquely scalable and applicable to a wide range of problems. We demonstrate its strong performance empirically in meta-RL and long-term memory domains. AMAGO's focus on sparse rewards and off-policy data also allows in-context learning to extend to goal-conditioned problems with challenging exploration. When combined with a novel hindsight relabeling scheme, AMAGO can solve a previously difficult category of open-world domains, where agents complete many possible instructions in procedurally generated environments. We evaluate our agent on three goal-conditioned domains and study how its individual improvements connect to create a generalist policy.

AMAGO是一个上下文强化学习代理，使用序列模型解决泛化、长期记忆和元学习的挑战，通过重新设计离策略上下文方法，能够训练长序列Transformer以整合端到端强化学习，在元强化学习和长期记忆领域展现出强大的实证性能，并且在稀疏奖励和离策略数据方面的专注使得上下文学习能够扩展到具有具有挑战性探索要求的目标条件问题。

AMAGO：面向自适应代理的可扩展上下文强化学习