This paper investigates how to incorporate expert observations (without
explicit information on expert actions) into a deep reinforcement learning
setting to improve sample efficiency. First, we formulate an augmented policy
loss combining a maximum entropy reinforcement learning objective with a
behavioral cloning loss that leverages a forward dynamics model. Then, we
propose an algorithm that automatically adjusts the weights of each component
in the augmented loss function. Experiments on a variety of continuous control
tasks demonstrate that the proposed algorithm outperforms various benchmarks by
effectively utilizing available expert observations.

该研究通过采用专家观察（不涉及具体专家行为信息）来改进深度强化学习模型的样本效率，并通过提出一种自动调整增强损失函数中各组成部分权重的算法，证明了该算法在多种连续控制任务中通过有效利用可用的专家观察优于其他基准模型。

基于模型的方法提高强化学习效率：借助专家观察

A Model-Based Approach for Improving Reinforcement Learning Efficiency  Leveraging Expert Observations

In reinforcement learning (RL), sparse rewards can present a significant
challenge. Fortunately, expert actions can be utilized to overcome this issue.
However, acquiring explicit expert actions can be costly, and expert
observations are often more readily available. This paper presents a new
approach that uses expert observations for learning in robot manipulation tasks
with sparse rewards from pixel observations. Specifically, our technique
involves using expert observations as intermediate visual goals for a
goal-conditioned RL agent, enabling it to complete a task by successively
reaching a series of goals. We demonstrate the efficacy of our method in five
challenging block construction tasks in simulation and show that when combined
with two state-of-the-art agents, our approach can significantly improve their
performance while requiring 4-20 times fewer expert actions during training.
Moreover, our method is also superior to a hierarchical baseline.

使用专家观察作为强化学习智能体的中间视觉目标，可以帮助解决稀疏奖励问题，从而提高性能并减少专家动作的使用。