In this paper, we present a Distributionally Robust Markov Decision Process
(DRMDP) approach for addressing the dynamic epidemic control problem. The
Susceptible-Exposed-Infectious-Recovered (SEIR) model is widely used to
represent the stochastic spread of infectious diseases, such as COVID-19. While
Markov Decision Processes (MDP) offers a mathematical framework for identifying
optimal actions, such as vaccination and transmission-reducing intervention, to
combat disease spreading according to the SEIR model. However, uncertainties in
these scenarios demand a more robust approach that is less reliant on
error-prone assumptions. The primary objective of our study is to introduce a
new DRMDP framework that allows for an ambiguous distribution of transition
dynamics. Specifically, we consider the worst-case distribution of these
transition probabilities within a decision-dependent ambiguity set. To overcome
the computational complexities associated with policy determination, we propose
an efficient Real-Time Dynamic Programming (RTDP) algorithm that is capable of
computing optimal policies based on the reformulated DRMDP model in an
accurate, timely, and scalable manner. Comparative analysis against the classic
MDP model demonstrates that the DRMDP achieves a lower proportion of infections
and susceptibilities at a reduced cost.

本文提出了一种分布鲁棒马尔科夫决策过程 (DRMDP) 方法来解决动态流行病控制问题，使用实时动态规划 (RTDP) 算法计算新的 DRMDP 模型的最优策略，预测出针对新冠病毒的疫苗接种和传输减少措施效果更好。

动态流行病控制中的决策依赖鲁棒马尔可夫决策过程方法

Decision-Dependent Distributionally Robust Markov Decision Process  Method in Dynamic Epidemic Control

The distributionally robust Markov Decision Process (MDP) approach asks for a
distributionally robust policy that achieves the maximal expected total reward
under the most adversarial distribution of uncertain parameters. In this paper,
we study distributionally robust MDPs where ambiguity sets for the uncertain
parameters are of a format that can easily incorporate in its description the
uncertainty's generalized moment as well as statistical distance information.
In this way, we generalize existing works on distributionally robust MDP with
generalized-moment-based and statistical-distance-based ambiguity sets to
incorporate information from the former class such as moments and dispersions
to the latter class that critically depends on empirical observations of the
uncertain parameters. We show that, under this format of ambiguity sets, the
resulting distributionally robust MDP remains tractable under mild technical
conditions. To be more specific, a distributionally robust policy can be
constructed by solving a sequence of one-stage convex optimization subproblems.

该研究探讨了在不确定参数的最具对抗性分布下，实现最大期望总回报的分布鲁棒 MDP，通过在模糊集格式中加入不确定性的广义矩和统计距离信息，将泛化动量和统计距离模糊集的现有研究推广到后者类别，进而提出了一种新的描述不确定性空间的模糊集形式。在此模糊集形式下，当满足一些温和的技术条件时，可以通过解决一系列一阶凸优化子问题来构建一份分布鲁棒策略。