随机动作 vs 随机策略：基于模型的直接策略搜索的引导

Oct, 2022

随机动作 vs 随机策略：基于模型的直接策略搜索的引导

Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search

Elias Hanna, Alex Coninx, Stéphane Doncieux

TL;DR本文研究了初始数据收集方法对动态模型学习的影响，并比较了两个文献中使用的初始化方法，结果表明任务依赖因素可能对每种方法都有害，建议探索混合方法。

Abstract

This paper studies the impact of the initial data gathering method on the subsequent learning of a dynamics model. dynamics models approximate the true transition function of a given task, in order to perform