离线强化学习中的自动折衷适应

Jun, 2023

Automatic Trade-off Adaptation in Offline RL

Phillip Swazinna, Steffen Udluft, Thomas Runkler

TL;DR本文提出一种改进的离线强化学习算法 - AutoLION，该算法可以在运行时自适应地调整策略行为，利用自动驾驶寻找正确的权衡参数来平衡保守性和性能优化。

Abstract

Recently, offline rl algorithms have been proposed that remain adaptive at runtime. For example, the lion algorithm \cite{lion} provides the user with an interface to set the trade-off between behavior cloning an