关于连续时间策略评估的贝尔曼方程 I：离散化与逼近

Jul, 2024

关于连续时间策略评估的贝尔曼方程 I：离散化与逼近

On Bellman equations for continuous-time policy evaluation I: discretization and approximation

Wenlong Mou, Yuhua Zhu

TL;DR从离散观察到的连续时间扩散过程轨迹计算价值函数的问题，我们开发了一种基于易于实现的数值方案的新类算法，与具有函数逼近的离散时间强化学习兼容。通过基于椭圆结构的方法得到有界逼近因子，即使有效范围发散到无穷大。

Abstract

We study the problem of computing the value function from a discretely-observed trajectory of a continuous-time diffusion process. We develop a new class of algorithms based on easily implementable →