This paper aims to explore the potential of combining Deep Reinforcement
Learning (DRL) with Knowledge Distillation (KD) by distilling various DRL
algorithms and studying their distillation effects. By doing so, the
computational burden of deep models could be reduced while maintaining the
performance. The primary objective is to provide a benchmark for evaluating the
performance of different DRL algorithms that have been refined using KD
techniques. By distilling these algorithms, the goal is to develop efficient
and fast DRL models. This research is expected to provide valuable insights
that can facilitate further advancements in this promising direction. By
exploring the combination of DRL and KD, this work aims to promote the
development of models that require fewer GPU resources, learn more quickly, and
make faster decisions in complex environments. The results of this research
have the capacity to significantly advance the field of DRL and pave the way
for the future deployment of resource-efficient, decision-making intelligent
systems.

通过挖掘深度强化学习（Deep Reinforcement Learning，DRL）与知识蒸馏（Knowledge Distillation，KD）相结合的潜力，本文通过蒸馏各种 DRL 算法并研究其蒸馏效果的方式，旨在减少深度模型的计算负担，在保持性能的同时实现高效与快速。研究目标是提供一个用于评估使用 KD 技术优化的不同 DRL 算法性能的基准。通过蒸馏这些算法，旨在开发高效和快速的 DRL 模型。此研究有望提供有价值的见解，促进这个有前途的领域的进一步发展。通过探索 DRL 和 KD 的结合，本研究旨在推动不仅需要较少 GPU 资源，还能在复杂环境中更快学习并做出更快决策的模型的发展。该研究的结果有能力显著推动 DRL 领域的发展，并为未来部署资源高效的决策智能系统铺平道路。

在资源受限环境下利用知识蒸馏提升高效深度强化学习

Leveraging Knowledge Distillation for Efficient Deep Reinforcement  Learning in Resource-Constrained Environments

The open radio access network (O-RAN) architecture supports intelligent
network control algorithms as one of its core capabilities. Data-driven
applications incorporate such algorithms to optimize radio access network (RAN)
functions via RAN intelligent controllers (RICs). Deep reinforcement learning
(DRL) algorithms are among the main approaches adopted in the O-RAN literature
to solve dynamic radio resource management problems. However, despite the
benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms
in real network deployments falls behind. This is primarily due to the slow
convergence and unstable performance exhibited by DRL agents upon deployment
and when facing previously unseen network conditions. In this paper, we address
these challenges by proposing transfer learning (TL) as a core component of the
training and deployment workflows for the DRL-based closed-loop control of
O-RAN functionalities. To this end, we propose and design a hybrid TL-aided
approach that leverages the advantages of both policy reuse and distillation TL
methods to provide safe and accelerated convergence in DRL-based O-RAN slicing.
We conduct a thorough experiment that accommodates multiple services, including
real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also
propose and implement policy reuse and distillation-aided DRL and non-TL-aided
DRL as three separate baselines. The proposed hybrid approach shows at least:
7.7% and 20.7% improvements in the average initial reward value and the
percentage of converged scenarios, and a 64.6% decrease in reward variance
while maintaining fast convergence and enhancing the generalizability compared
with the baselines.

提出了使用深度强化学习 (DRL) 算法的开放无线接入网络 (O-RAN) 切片的转移学习 (TL) 辅助方法，通过政策复用和蒸馏转移学习方法实现了快速收敛和改进泛化能力，显著提高了初始奖励值和收敛场景百分比，减少了奖励方差。

一种安全且加速的基于深度强化学习的 O-RAN 切片的混合迁移学习方法

Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A  Hybrid Transfer Learning Approach

In this paper, we present Tianshou, a highly modularized Python library for
deep reinforcement learning (DRL) that uses PyTorch as its backend. Tianshou
intends to be research-friendly by providing a flexible and reliable
infrastructure of DRL algorithms. It supports online and offline training with
more than 20 classic algorithms through a unified interface. To facilitate
related research and prove Tianshou's reliability, we have released Tianshou's
benchmark of MuJoCo environments, covering eight classic algorithms with
state-of-the-art performance. We open-sourced Tianshou at
this https URL

本文介绍了 Tianshou，这是一个高度模块化的 Python 库，使用 PyTorch 作为其后端，旨在提供灵活可靠的深度强化学习算法基础设施，支持在线和离线训练，具有统一接口，并通过 MuJoCo 环境的基准测试证明了其可靠性。