关于政策复用中短视行为的价值

May, 2023

On the Value of Myopic Behavior in Policy Reuse

Kang Xu, Chenjia Bai, Shuang Qiu, Haoran He, Bin Zhao...

TL;DR本文介绍了一种名为SMEC的框架，利用混合值函数架构评估先前策略的行为，自适应地聚合先前策略的可共享的短期行为和任务策略的长期行为，以实现协调决策，实验证明它优于现有方法，有效地利用了相关的先前策略。

Abstract

Leveraging learned strategies in unfamiliar scenarios is fundamental to human intelligence. In reinforcement learning, rationally reusing the policies acquired from other tasks or human experts is critical for tackling problems that are difficult to learn from scratch. In this work, we