Compared to traditional imitation learning methods such as DAgger and DART,
intervention-based imitation offers a more convenient and sample efficient data
collection process to users. In this paper, we introduce Reinforced
Intervention-based Learning (ReIL), a framework consisting of a general
intervention-based learning algorithm and a multi-task imitation learning model
aimed at enabling non-expert users to train agents in real environments with
little supervision or fine tuning. ReIL achieves this with an algorithm that
combines the advantages of imitation learning and reinforcement learning and a
model capable of concurrently processing demonstrations, past experience, and
current observations. Experimental results from real world mobile robot
navigation challenges indicate that ReIL learns rapidly from sparse supervisor
corrections without suffering deterioration in performance that is
characteristic of supervised learning-based methods such as HG-Dagger and IWR.
The results also demonstrate that in contrast to other intervention-based
methods such as IARL and EGPO, ReIL can utilize an arbitrary reward function
for training without any additional heuristics.

本文提出了一种基于增强和干预的多任务学习框架 --ReIL，该框架旨在实现在无需过多监督和调整的情况下，在真实环境中训练代理。实验结果表明，相较于其他基于干预的方法，ReIL 使用任意奖励函数进行训练时无需使用额外启发式方法，能够在稀疏监督信号的情况下快速学习并保持性能。

ReIL: 基于强化干预的模仿学习框架

ReIL: A Framework for Reinforced Intervention-based Imitation Learning

In this paper, we propose to investigate the problem of out-of-domain
visio-linguistic pretraining, where the pretraining data distribution differs
from that of downstream data on which the pretrained model will be fine-tuned.
Existing methods for this problem are purely likelihood-based, leading to the
spurious correlations and hurt the generalization ability when transferred to
out-of-domain downstream tasks. By spurious correlation, we mean that the
conditional probability of one token (object or word) given another one can be
high (due to the dataset biases) without robust (causal) relationships between
them. To mitigate such dataset biases, we propose a Deconfounded
Visio-Linguistic Bert framework, abbreviated as DeVLBert, to perform
intervention-based learning. We borrow the idea of the backdoor adjustment from
the research field of causality and propose several neural-network based
architectures for Bert-style out-of-domain pretraining. The quantitative
results on three downstream tasks, Image Retrieval (IR), Zero-shot IR, and
Visual Question Answering, show the effectiveness of DeVLBert by boosting
generalization ability.

本文提出了 Deconfounded Visio-Linguistic Bert 框架，解决了视觉语言预训练中的跨域问题，并通过干预学习减轻数据集偏差，从而提高了模型的泛化能力。