The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of steady-state error often manifest when quadratic reward functions are employed. Although existing solutions using absolute-value-type reward functions partially address this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of long-term rewards and, consequently, alleviating concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) model and lane change models, we validate that the proposed method not only effectively diminishes steady-state errors but also results in smoother variations in system states.

该研究提出了一种在强化学习中选择奖励函数的方法，通过将积分项引入二次型奖励函数中，使得强化学习算法在考虑长期奖励的同时，有效减小稳态误差并实现系统状态的平稳变化。

用于带有二次奖励的强化学习的稳态误差补偿