Combining Reinforcement Learning (RL) with a prior controller can yield the
best out of two worlds: RL can solve complex nonlinear problems, while the
control prior ensures safer exploration and speeds up training. Prior work
largely blends both components with a fixed weight, neglecting that the RL
agent's performance varies with the training progress and across regions in the
state space. Therefore, we advocate for an adaptive strategy that dynamically
adjusts the weighting based on the RL agent's current capabilities. We propose
a new adaptive hybrid RL algorithm, Contextualized Hybrid Ensemble Q-learning
(CHEQ). CHEQ combines three key ingredients: (i) a time-invariant formulation
of the adaptive hybrid RL problem treating the adaptive weight as a context
variable, (ii) a weight adaption mechanism based on the parametric uncertainty
of a critic ensemble, and (iii) ensemble-based acceleration for data-efficient
RL. Evaluating CHEQ on a car racing task reveals substantially stronger data
efficiency, exploration safety, and transferability to unknown scenarios than
state-of-the-art adaptive hybrid RL methods.

结合强化学习和先验控制器可以获得两个世界中的最佳结果：强化学习可以解决复杂的非线性问题，而控制器可以确保更安全的探索和加快训练。本文提出了一种新的自适应混合强化学习算法，通过动态调整加权来适应强化学习代理当前的能力，从而提高数据效率、探索安全性和对未知场景的可迁移性。