With the continuous growth in communication network complexity and traffic
volume, communication load balancing solutions are receiving increasing
attention. Specifically, reinforcement learning (RL)-based methods have shown
impressive performance compared with traditional rule-based methods. However,
standard RL methods generally require an enormous amount of data to train, and
generalize poorly to scenarios that are not encountered during training. We
propose a policy reuse framework in which a policy selector chooses the most
suitable pre-trained RL policy to execute based on the current traffic
condition. Our method hinges on a policy bank composed of policies trained on a
diverse set of traffic scenarios. When deploying to an unknown traffic
scenario, we select a policy from the policy bank based on the similarity
between the previous-day traffic of the current scenario and the traffic
observed during training. Experiments demonstrate that this framework can
outperform classical and adaptive rule-based methods by a large margin.

本研究提出了一种基于强化学习的策略重用框架，通过在各种交通场景下训练和存储策略，并结合流量条件，选择最适合的预训练策略以更好地解决通信网络负载均衡问题。实验结果表明，这种方法比传统的基于规则和适应性方法表现更出色。