We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning
(Meta-RL) algorithm designed to efficiently adapt control policies to changing
dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift
adaptation to new dynamics with minimal interaction data. Existing Meta-RL
methods require abundant meta-learning data, limiting their applicability in
settings such as robotics, where data is costly to obtain. To address this,
PACOH-RL incorporates regularization and epistemic uncertainty quantification
in both the meta-learning and task adaptation stages. When facing new dynamics,
we use these uncertainty estimates to effectively guide exploration and data
collection. Overall, this enables positive transfer, even when access to data
from prior tasks or dynamic settings is severely limited. Our experiment
results demonstrate that PACOH-RL outperforms model-based RL and model-based
Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real
robotic car, we showcase the potential for efficient RL policy adaptation in
diverse, data-scarce conditions.

PACOH-RL 是一种基于模型的元强化学习算法，用于有效地适应控制策略对变化的动力学。它通过元学习动力学模型的先验知识来实现对新动力学的快速适应，同时利用正则化和认知不确定性量化来引导探索和数据收集，从而在数据有限的情况下实现正向传递，适用于机器人等领域。实验结果表明，PACOH-RL 在适应新动力学条件方面优于基于模型的强化学习和基于模型的元强化学习基准，并在真实的机器人车中展示了在数据稀缺条件下实现高效强化学习策略自适应的潜力。