BriefGPT.xyz
Nov, 2018
马尔可夫决策过程中的时间规则化
Temporal Regularization in Markov Decision Process
HTML
PDF
Pierre Thodoroff, Audrey Durand, Joelle Pineau, Doina Precup
TL;DR
本篇论文介绍了一种基于时间规则化的强化学习方法,利用马尔可夫链概念正式描述技术引入的偏差。在简单的离散和连续MDP中说明时间规则化的各种特性,并表明该技术即使在高维Atari游戏中也提供了改进。
Abstract
Several applications of
reinforcement learning
suffer from instability due to high variance. This is especially prevalent in high dimensional domains.
regularization
is a commonly used technique in machine learni
→