强化学习离线策略评估的实证研究

Nov, 2019

强化学习离线策略评估的实证研究

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

TL;DR通过实验基准和实证研究，我们提供了针对强化学习中的离线策略评估（OPE）的实验基准和实证研究，重点研究了实验设计的多样性以启用OPE方法的应力测试。我们提供了一个完整的基准套件，以研究不同属性对方法性能的相互作用，并将结果总结为实践指南。我们提供的Caltech OPE 基准测试套件（COBS）是开源的，并邀请感兴趣的研究人员进一步贡献。

Abstract

off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy using only pre-collected historical data generated by another policy. Given the increasing interest in deploying learning-based methods for safety-critical applications, many recent OPE m