BriefGPT.xyz
Feb, 2022
使用双仿度量进行近似策略迭代
Trusted Approximate Policy Iteration with Bisimulation Metrics
HTML
PDF
Mete Kemertas, Allan Jepson
TL;DR
本文提出 Sinkhorn 距离可以定义 Bisimulation metrics,通过 Bisimulation-based discretization 的 Approximate Policy Iteration 可以在 Actor-Critic methods 中更好的学到状态表示,理论分析和实验结果支持我们的结论。
Abstract
bisimulation metrics
define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in
value function appr
→