Leveraging many sources of offline robot data requires grappling with the
heterogeneity of such data. In this paper, we focus on one particular aspect of
heterogeneity: learning from offline data collected at different control
frequencies. Across labs, the discretization of controllers, sampling rates of
sensors, and demands of a task of interest may differ, giving rise to a mixture
of frequencies in an aggregated dataset. We study how well offline
reinforcement learning (RL) algorithms can accommodate data with a mixture of
frequencies during training. We observe that the $Q$-value propagates at
different rates for different discretizations, leading to a number of learning
challenges for off-the-shelf offline RL. We present a simple yet effective
solution that enforces consistency in the rate of $Q$-value updates to
stabilize learning. By scaling the value of $N$ in $N$-step returns with the
discretization size, we effectively balance $Q$-value propagation, leading to
more stable convergence. On three simulated robotic control problems, we
empirically find that this simple approach outperforms naïve mixing by 50% on
average.

研究离线机器人数据的异构性，着重于不同控制频率下离线数据的学习，提出一种简单且有效的方法，通过对 Q 值更新速度的保持一致性平衡 Q 值传播，最终在三个模拟机器人控制问题中显著提高算法性能。