BriefGPT.xyz
Jul, 2024
关于连续时间策略评估的贝尔曼方程 I:离散化与逼近
On Bellman equations for continuous-time policy evaluation I: discretization and approximation
HTML
PDF
Wenlong Mou, Yuhua Zhu
TL;DR
从离散观察到的连续时间扩散过程轨迹计算价值函数的问题,我们开发了一种基于易于实现的数值方案的新类算法,与具有函数逼近的离散时间强化学习兼容。通过基于椭圆结构的方法得到有界逼近因子,即使有效范围发散到无穷大。
Abstract
We study the problem of computing the
value function
from a discretely-observed trajectory of a continuous-time
diffusion process
. We develop a new class of algorithms based on easily implementable
→