Digital twin (DT) platforms are increasingly regarded as a promising
technology for controlling, optimizing, and monitoring complex engineering
systems such as next-generation wireless networks. An important challenge in
adopting DT solutions is their reliance on data collected offline, lacking
direct access to the physical environment. This limitation is particularly
severe in multi-agent systems, for which conventional multi-agent reinforcement
(MARL) requires online interactions with the environment. A direct application
of online MARL schemes to an offline setting would generally fail due to the
epistemic uncertainty entailed by the limited availability of data. In this
work, we propose an offline MARL scheme for DT-based wireless networks that
integrates distributional RL and conservative Q-learning to address the
environment's inherent aleatoric uncertainty and the epistemic uncertainty
arising from limited data. To further exploit the offline data, we adapt the
proposed scheme to the centralized training decentralized execution framework,
allowing joint training of the agents' policies. The proposed MARL scheme,
referred to as multi-agent conservative quantile regression (MA-CQR) addresses
general risk-sensitive design criteria and is applied to the trajectory
planning problem in drone networks, showcasing its advantages.

提出了一种适用于基于数字孪生的无线网络的离线多智能体保守分位回归 (MA-CQR) 方案，通过集成分布式强化学习和保守 Q 学习来解决环境的内在的随机性不确定性和数据有限性导致的认识不确定性。在无人机网络中应用该方案，展示了其对轨迹规划问题的优势。

数字孪生的保守和风险意识离线多智能体强化学习

Conservative and Risk-Aware Offline Multi-Agent Reinforcement Learning  for Digital Twins

Being able to harness the power of large, static datasets for developing
autonomous multi-agent systems could unlock enormous value for real-world
applications. Many important industrial systems are multi-agent in nature and
are difficult to model using bespoke simulators. However, in industry,
distributed system processes can often be recorded during operation, and large
quantities of demonstrative data can be stored. Offline multi-agent
reinforcement learning (MARL) provides a promising paradigm for building
effective online controllers from static datasets. However, offline MARL is
still in its infancy, and, therefore, lacks standardised benchmarks, baselines
and evaluation protocols typically found in more mature subfields of RL. This
deficiency makes it difficult for the community to sensibly measure progress.
In this work, we aim to fill this gap by releasing \emph{off-the-grid MARL
(OG-MARL)}: a framework for generating offline MARL datasets and algorithms. We
release an initial set of datasets and baselines for cooperative offline MARL,
created using the framework, along with a standardised evaluation protocol. Our
datasets provide settings that are characteristic of real-world systems,
including complex dynamics, non-stationarity, partial observability,
suboptimality and sparse rewards, and are generated from popular online MARL
benchmarks. We hope that OG-MARL will serve the community and help steer
progress in offline MARL, while also providing an easy entry point for
researchers new to the field.

该研究为填补离线多智能体强化学习（MARL）领域中缺乏标准基准和评估方法的空白，提出了一个名为 OG-MARL 的离线 MARL 数据集和算法框架，包括一套标准评估方案。OG-MARL 的数据集本质上是从在线 MARL 基准中生成的，具有复杂的动态、非静态性、局部可观察性、次优性和稀疏奖励等特征。