We consider the transfer of experience samples (i.e., tuples < s, a, s', r >)
in reinforcement learning (RL), collected from a set of source tasks to improve
the learning process in a given target task. Most of the related approaches
focus on selecting the most relevant source samples for solving the target
task, but then all the transferred samples are used without considering anymore
the discrepancies between the task models. In this paper, we propose a
model-based technique that automatically estimates the relevance (importance
weight) of each source sample for solving the target task. In the proposed
approach, all the samples are transferred and used by a batch RL algorithm to
solve the target task, but their contribution to the learning process is
proportional to their importance weight. By extending the results for
importance weighting provided in supervised learning literature, we develop a
finite-sample analysis of the proposed batch RL algorithm. Furthermore, we
empirically compare the proposed algorithm to state-of-the-art approaches,
showing that it achieves better learning performance and is very robust to
negative transfer, even when some source tasks are significantly different from
the target task.

本篇论文提出了一种基于模型的技术，在传输体验样本时自动估算每个样本与给定目标任务的关联性，以及在 RL 问题中使用重要性权重来解决负迁移问题，模型的结果经实验证明比目前最新的方法具有更好的学习性能和抗差能力。