The assumption that data are independent and identically distributed underpins all machine learning. When data are collected sequentially from agent experiences this assumption does not generally hold, as in reinforcement learning. Here, we derive a method that overcomes these limitations by exploiting the statistical mechanics of ergodic processes, which we term maximum diffusion reinforcement learning. By decorrelating agent experiences, our approach provably enables agents to learn continually in single-shot deployments regardless of how they are initialized. Moreover, we prove our approach generalizes well-known maximum entropy techniques, and show that it robustly exceeds state-of-the-art performance across popular benchmarks. Our results at the nexus of physics, learning, and control pave the way towards more transparent and reliable decision-making in reinforcement learning agents, such as locomoting robots and self-driving cars.

通过利用各态过程的统计力学，提出了一种称为最大扩散增强学习的方法，在单次部署中可使代理能够连续学习，无论如何初始化。该方法可以去除代理经验之间的相关性，证明了其优于流行基准的最优性能，并为增强学习代理（如行走机器人和自动驾驶车辆）的透明和可靠决策铺平了道路。

最大扩散强化学习