Recent research has turned to Reinforcement Learning (RL) to solve
challenging decision problems, as an alternative to hand-tuned heuristics. RL
can learn good policies without the need for modeling the environment's
dynamics. Despite this promise, RL remains an impractical solution for many
real-world systems problems. A particularly challenging case occurs when the
environment changes over time, i.e. it exhibits non-stationarity. In this work,
we characterize the challenges introduced by non-stationarity, shed light on
the range of approaches to them and develop a robust framework for addressing
them to train RL agents in live systems. Such agents must explore and learn new
environments, without hurting the system's performance, and remember them over
time. To this end, our framework (i) identifies different environments
encountered by the live system, (ii) triggers exploration when necessary, (iii)
takes precautions to retain knowledge from prior environments, and (iv) employs
safeguards to protect the system's performance when the RL agent makes
mistakes. We apply our framework to two systems problems, straggler mitigation
and adaptive video streaming, and evaluate it against a variety of alternative
approaches using real-world and synthetic data. We show that all components of
the framework are necessary to cope with non-stationarity and provide guidance
on alternative design choices for each component.

探索如何通过一个稳健的框架解决非稳态环境下的强化学习问题，其中该框架通过识别不同的环境、触发探索、将先前环境的知识保留下来以及保护系统性能来训练 RL agent，并且在解决一些系统问题时进行了验证。