This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL), presenting an upper bound of ${\mathcal{O}}(d\sqrt{H^3 T \log T})$, where $d$ represents the dimensionality of the transition kernel, $H$ the planning horizon, and $T$ the total number of interactions. This signifies a methodological enhancement by optimizing the $\mathcal{O}(\sqrt{\log T})$ factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds more effectively.

利用线性混合马尔可夫决策过程模拟的函数逼近方法，本研究推进了强化学习中的随机探索。我们建立了关于函数逼近的依赖先验的贝叶斯遗憾界限，并对后验抽样强化学习的贝叶斯遗憾分析进行了改进，提出了一个上界为O(d√(H^3 T log T))的方法，其中d表示转移核的维度，H表示规划时间，T表示总交互次数。相对于线性混合马尔可夫决策过程的先前基准(Osband和Van Roy，2014)优化了O(√log T)因子，我们的方法采用了面向值的模型学习视角，引入解耦和方案和方差减少技术，超越了传统分析对置信区间和集中不等式的依赖，更有效地规范贝叶斯遗憾界限。

先验依赖的函数逼近后验采样强化学习分析