We present Premier-TACO, a multitask feature representation learning approach
designed to improve few-shot policy learning efficiency in sequential
decision-making tasks. Premier-TACO leverages a subset of multitask offline
datasets for pretraining a general feature representation, which captures
critical environmental dynamics and is fine-tuned using minimal expert
demonstrations. It advances the temporal action contrastive learning (TACO)
objective, known for state-of-the-art results in visual control tasks, by
incorporating a novel negative example sampling strategy. This strategy is
crucial in significantly boosting TACO's computational efficiency, making
large-scale multitask offline pretraining feasible. Our extensive empirical
evaluation in a diverse set of continuous control benchmarks including Deepmind
Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO's effectiveness
in pretraining visual representations, significantly enhancing few-shot
imitation learning of novel tasks. Our code, pretraining data, as well as
pretrained model checkpoints will be released at
this https URL

Premier-TACO 是一种多任务特征表示学习方法，旨在提高序贯决策任务中的少样本策略学习效率。它利用一组多任务离线数据集对通用特征表示进行预训练，捕捉关键的环境动态，并使用最少的专家演示进行微调。它改进了时空行动对比学习（TACO）目标，在视觉控制任务中取得了最先进的结果，并结合了一种新颖的负样本采样策略，极大地提高了 TACO 的计算效率，从而使大规模多任务离线预训练成为可能。我们在包括 DeepMind Control Suite、MetaWorld 和 LIBERO 在内的多个连续控制基准测试中进行了大量的实证评估，结果表明 Premier-TACO 在预训练视觉表示方面效果显著，在少样本模仿学习新任务中得到了显著的改善。我们将在此 URL 发布代码、预训练数据和预训练模型检查点。

Premier-TACO：通过时间驱动对比损失进行多任务表示的预训练

Premier-TACO: Pretraining Multitask Representation via Temporal  Action-Driven Contrastive Loss

Model-based imitation learning (MBIL) is a popular reinforcement learning
method that improves sample efficiency on high-dimension input sources, such as
images and videos. Following the convention of MBIL research, existing
algorithms are highly deceptive by task-irrelevant information, especially
moving distractors in videos. To tackle this problem, we propose a new
algorithm - named Separated Model-based Adversarial Imitation Learning (SeMAIL)
- decoupling the environment dynamics into two parts by task-relevant
dependency, which is determined by agent actions, and training separately. In
this way, the agent can imagine its trajectories and imitate the expert
behavior efficiently in task-relevant state space. Our method achieves
near-expert performance on various visual control tasks with complex
observations and the more challenging tasks with different backgrounds from
expert observations.

我们提出了一种名为 Separated Model-based Adversarial Imitation Learning (SeMAIL) 的算法，通过任务相关性将环境动态分解为两个部分并进行分别训练，使得代理能够在任务相关状态空间中高效地想象其轨迹并模仿专家行为，从而实现在各种具有复杂观察结果的视觉控制任务中接近专家表现的效果。

SeMAIL：通过分离模型消除视觉模仿中的干扰因素

SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models

We present MEM: Multi-view Exploration Maximization for tackling complex
visual control tasks. To the best of our knowledge, MEM is the first approach
that combines multi-view representation learning and intrinsic reward-driven
exploration in reinforcement learning (RL). More specifically, MEM first
extracts the specific and shared information of multi-view observations to form
high-quality features before performing RL on the learned features, enabling
the agent to fully comprehend the environment and yield better actions.
Furthermore, MEM transforms the multi-view features into intrinsic rewards
based on entropy maximization to encourage exploration. As a result, MEM can
significantly promote the sample-efficiency and generalization ability of the
RL agent, facilitating solving real-world problems with high-dimensional
observations and spare-reward space. We evaluate MEM on various tasks from
DeepMind Control Suite and Procgen games. Extensive simulation results
demonstrate that MEM can achieve superior performance and outperform the
benchmarking schemes with simple architecture and higher efficiency.

本研究提出了 MEM: Multi-view Exploration Maximization 模型，它是第一种将多视图表示学习与内在奖励驱动的探索相结合的强化学习方法。实验结果表明，MEM 可以在高维环境和稀疏奖励空间下显着提高强化学习代理的样本效率和泛化能力，从而解决现实世界复杂的视觉控制任务。

多视角探索最大化解决视觉控制问题

Tackling Visual Control via Multi-View Exploration Maximization

Learned world models summarize an agent's experience to facilitate learning
complex behaviors. While learning world models from high-dimensional sensory
inputs is becoming feasible through deep learning, there are many potential
ways for deriving behaviors from them. We present Dreamer, a reinforcement
learning agent that solves long-horizon tasks from images purely by latent
imagination. We efficiently learn behaviors by propagating analytic gradients
of learned state values back through trajectories imagined in the compact state
space of a learned world model. On 20 challenging visual control tasks, Dreamer
exceeds existing approaches in data-efficiency, computation time, and final
performance.

使用潜意识想象力，在学习世界模型的基础上，Dreamer 这一强化学习代理能够纯粹通过图像解决长周期任务，具有数据效率高，计算时间短和最终性能强等优势。