We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.

本文提出了Bayesian Team Imitation Learner (BTIL)算法，该算法可用于多智能体领域中的团队序列任务的建模，通过对团队成员的心理状态进行显式建模和推断，从而实现了分散式团队策略的学习。此外，BTIL采用Bayesian的观点，能够从小型数据集合半监督演示中实现样本和标记的高效学习。经过实验，证明了BTIL可以从演示中成功地学习团队策略，尽管团队成员的心理状态是会发生变化并可能导致团队不完美的合作。

半监督模仿学习从次优演示中学习团队策略