The current thesis aims to explore the reinforcement learning field and build
on existing methods to produce improved ones to tackle the problem of learning
in high-dimensional and complex environments. It addresses such goals by
decomposing learning tasks in a hierarchical fashion kno
本文利用潜在变量模型将层次化模仿学习问题转化为参数推断,理论上表征了 Daniel 等人(2016)提出的 EM 方法。研究了种群水平算法作为中间步骤的性能保证,证明了该算法在一定的正则条件下以高概率收敛于真实参数周围的范数球上。据我们所知,这是第一个仅观察原始状态 - 动作对的层次化模仿学习算法的性能保证。