Daniel Jarrett, Ioana Bica, Mihaela van der Schaar
TL;DR该论文提出了一种通过进行基于演示行为的学习,以在完全离线的模式下执行严格批量模仿学习。
Abstract
Consider learning a policy purely on the basis of demonstrated behavior---that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment. This *strictly batch imitation learning* problem arises wherever live experi