The balance between exploration and exploitation is a key problem for reinforcement learning methods, especially for Q-learning. In this paper, a fidelity-based probabilistic Q-learning (FPQL) approach is presented to naturally solve this problem and applied for learning control of quantum systems. In this approach, fidelity is adopted to help direct the learning process and the probability of each action to be selected at a certain state is updated iteratively along with the learning process, which leads to a natural exploration strategy instead of a pointed one with configured parameters. A probabilistic Q-learning (PQL) algorithm is first presented to demonstrate the basic idea of probabilistic action selection. Then the FPQL algorithm is presented for learning control of quantum systems. Two examples (a spin- 1/2 system and a lamda-type atomic system) are demonstrated to test the performance of the FPQL algorithm. The results show that FPQL algorithms attain a better balance between exploration and exploitation, and can also avoid local optimal policies and accelerate the learning process.

本文介绍了一种基于保真度概率Q学习(FPQL)的方法，用于解决强化学习中探索和利用之间的平衡问题并应用于控制量子系统；该算法采用了保真度来指导学习过程，通过迭代更新每个状态下选择每个动作的概率，实现自然的探索策略而不是基于配置参数的指向性探索，且该算法在学习过程中可以避免局部最优策略从而加速学习过程。

基于保真度的概率Q学习用于量子系统控制