Model-based offline Reinforcement Learning (RL) allows agents to fully
utilise pre-collected datasets without requiring additional or unethical
explorations. However, applying model-based offline RL to online systems
presents challenges, primarily due to the highly suboptimal (noise-filled) and
diverse nature of datasets generated by online systems. To tackle these issues,
we introduce the Causal Prompting Reinforcement Learning (CPRL) framework,
designed for highly suboptimal and resource-constrained online scenarios. The
initial phase of CPRL involves the introduction of the Hidden-Parameter Block
Causal Prompting Dynamic (Hip-BCPD) to model environmental dynamics. This
approach utilises invariant causal prompts and aligns hidden parameters to
generalise to new and diverse online users. In the subsequent phase, a single
policy is trained to address multiple tasks through the amalgamation of
reusable skills, circumventing the need for training from scratch. Experiments
conducted across datasets with varying levels of noise, including
simulation-based and real-world offline datasets from the Dnurse APP,
demonstrate that our proposed method can make robust decisions in
out-of-distribution and noisy environments, outperforming contemporary
algorithms. Additionally, we separately verify the contributions of Hip-BCPDs
and the skill-reuse strategy to the robustness of performance. We further
analyse the visualised structure of Hip-BCPD and the interpretability of
sub-skills. We released our source code and the first ever real-world medical
dataset for precise medical decision-making tasks.

基于模型的离线强化学习可以在不需要额外或不道德的探索的情况下充分利用预先收集到的数据集。然而，将基于模型的离线强化学习应用于在线系统面临挑战，主要是由于在线系统生成的数据集高度次优（充满噪声）和多样化的特点。为了解决这些问题，我们引入了适用于高度次优和资源受限的在线场景的因果激发强化学习（CPRL）框架。CPRL 的初始阶段涉及到引入隐藏参数块因果激发动态（Hip-BCPD）来建模环境动态。该方法利用不变因果激发并对齐隐藏参数以推广到新的多样化的在线用户。在随后的阶段，通过可重用技能的融合来训练单一策略以应对多个任务，从而避免了从头开始训练的需要。在具有不同噪声水平的数据集上进行的实验证明了我们所提出的方法在分布外和嘈杂环境中可以做出稳健的决策并优于现有的算法。此外，我们还分别验证了 Hip-BCPD 和技能重用策略对性能稳健性的贡献，并对 Hip-BCPD 的可视化结构和子技能的可解释性进行了进一步分析。我们发布了我们的源代码和第一个用于精确医疗决策任务的真实世界医疗数据集。