Inverse Reinforcement Learning (IRL) techniques deal with the problem of
deducing a reward function that explains the behavior of an expert agent who is
assumed to act optimally in an underlying unknown task. In several problems of
interest, however, it is possible to observe the behavior of multiple experts
with different degree of optimality (e.g., racing drivers whose skills ranges
from amateurs to professionals). For this reason, in this work, we extend the
IRL formulation to problems where, in addition to demonstrations from the
optimal agent, we can observe the behavior of multiple sub-optimal experts.
Given this problem, we first study the theoretical properties of the class of
reward functions that are compatible with a given set of experts, i.e., the
feasible reward set. Our results show that the presence of multiple sub-optimal
experts can significantly shrink the set of compatible rewards. Furthermore, we
study the statistical complexity of estimating the feasible reward set with a
generative model. To this end, we analyze a uniform sampling algorithm that
results in being minimax optimal whenever the sub-optimal experts' performance
level is sufficiently close to the one of the optimal agent.

给定一个包含多个次优专家行为的问题，我们将逆向强化学习（IRL）方法扩展到了这种情况，研究了与给定专家集兼容的奖励函数的理论性质，并分析了使用生成模型估计可行奖励集的统计复杂性，得到了一个具有极小极大最优性的均匀采样算法。

具有次优专家的逆强化学习

Inverse Reinforcement Learning with Sub-optimal Experts

In the last decade many different algorithms have been proposed to track a
generic object in videos. Their execution on recent large-scale video datasets
can produce a great amount of various tracking behaviours. New trends in
Reinforcement Learning showed that demonstrations of an expert agent can be
efficiently used to speed-up the process of policy learning. Taking inspiration
from such works and from the recent applications of Reinforcement Learning to
visual tracking, we propose two novel trackers, A3CT, which exploits
demonstrations of a state-of-the-art tracker to learn an effective tracking
policy, and A3CTD, that takes advantage of the same expert tracker to correct
its behaviour during tracking. Through an extensive experimental validation on
the GOT-10k, OTB-100, LaSOT, UAV123 and VOT benchmarks, we show that the
proposed trackers achieve state-of-the-art performance while running in
real-time.

通过应用强化学习的最新趋势并借鉴专家代理的演示，提出了两种新型跟踪器：A3CT 和 A3CTD，均利用现有的跟踪器进行有效的跟踪，并在多个基准测试中取得了最新的成果。

通过深度强化学习和专家演示实现的视觉追踪

Visual Tracking by means of Deep Reinforcement Learning and an Expert  Demonstrator

Imitation learning is the process by which one agent tries to learn how to
perform a certain task using information generated by another, often
more-expert agent performing that same task. Conventionally, the imitator has
access to both state and action information generated by an expert performing
the task (e.g., the expert may provide a kinesthetic demonstration of object
placement using a robotic arm). However, requiring the action information
prevents imitation learning from a large number of existing valuable learning
resources such as online videos of humans performing tasks. To overcome this
issue, the specific problem of imitation from observation (IfO) has recently
garnered a great deal of attention, in which the imitator only has access to
the state information (e.g., video frames) generated by the expert. In this
paper, we provide a literature review of methods developed for IfO, and then
point out some open research problems and potential future work.

本文为观察式模仿学习提供了文献综述，并指出了一些开放性研究问题和未来可行性工作。