This paper studies reinforcement learning (RL) under malicious falsification on cost signals and introduces a quantitative framework of attack models to understand the vulnerabilities of RL. Focusing on $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A numerical case study of water reservoir control is provided to show the potential hazards of RL in learning-based control systems and corroborate the results.

本文研究了恶意篡改成本信号下的强化学习，并介绍了攻击模型的量化框架，该模型有助于了解强化学习的漏洞。通过对Q-learning的研究，我们证明了Q-learning算法在隐蔽攻击和成本信号有限篡改下仍然能够收敛。通过解析篡改代价与Q因素及学习代理所学习的策略之间的关系，我们提供了可行的攻击和防御策略的基本限制。我们提出了一种鲁棒的代价区间，即在该区间内对手永远无法实现目标策略。我们给出了一些关于篡改代价的条件，它们能够误导代理学习对手所偏爱的策略。最后，通过一个水库控制的数值案例研究，我们展示了学习型控制系统中强化学习的潜在危险并证实了我们的研究结果。

对成本信号进行对抗操纵的欺骗性强化学习