Patrick Emedom-Nnamdi, Abram L. Friesen, Bobak Shahriari, Nando de Freitas, Matt W. Hoffman
TL;DR本文探讨在离线和有人专家参与的环境下,如何利用专家提供的数据及信息来改善演员-评论家方法的样本需求复杂性和覆盖率,并在DeepMind Control Suite上验证了这一方法。
Abstract
Standard approaches to sequential decision-making exploit an agent's ability to continually interact with its environment and improve its control policy. However, due to safety, ethical, and practicality constraints, this type of trial-and-error experimentation is often infeasible in m