With the ability to learn from static datasets, Offline Reinforcement
Learning (RL) emerges as a compelling avenue for real-world applications.
However, state-of-the-art offline RL algorithms perform sub-optimally when
confronted with limited data confined to specific regions within the state
space. The performance degradation is attributed to the inability of offline RL
algorithms to learn appropriate actions for rare or unseen observations. This
paper proposes a novel domain knowledge-based regularization technique and
adaptively refines the initial domain knowledge to considerably boost
performance in limited data with partially omitted states. The key insight is
that the regularization term mitigates erroneous actions for sparse samples and
unobserved states covered by domain knowledge. Empirical evaluations on
standard discrete environment datasets demonstrate a substantial average
performance increase of at least 27% compared to existing offline RL algorithms
operating on limited data.

通过领域知识约束和自适应改进初步的领域知识，该论文提出了一种能够显著提高有限数据下性能的新颖离线强化学习（RL）算法，并通过对标准离散环境数据集的实证评估，显示相比于现有离线 RL 算法，性能至少提升了 27%。