In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.

我们将多时间尺度控制和通信 (MTCC) 问题分解为基于深度强化学习 (DRL) 的车队控制 (PC) 子问题和基于 DRL 的无线资源分配 (RRA) 子问题，并提出了用于学习最优 PC 策略的 MTCC-PC 算法和用于学习 RRA 策略的 MTCC-RRA 算法。我们采用奖励塑形和奖励反向传播优先经验回放 (RBPER) 技巧来高效地解决多智能体和稀疏奖励问题，并提出了一种样本和计算高效的训练方法来共同学习 PC 和 RRA 策略。通过使用真实驾驶数据进行实验，将 MTCC 的性能与基准 DRL 算法进行了比较，验证了所提出的 MTCC 算法的有效性。

深度强化学习的多时间尺度控制和通信 - 第二部分: 控制感知的无线资源分配