Domain randomization is a popular technique for improving domain transfer, often used in a zero-shot setting when the target domain is unknown or cannot easily be used for training. In this work, we empirically examine the effects of domain randomization on agent generalization. Our experiments show that domain randomization may lead to suboptimal, high-variance policies, which we attribute to the uniform sampling of environment parameters. We propose Active Domain Randomization, a novel algorithm that learns a parameter sampling strategy. Our method looks for the most informative environment variations within the given randomization ranges by leveraging the discrepancies of policy rollouts in randomized and reference environment instances. We find that training more frequently on these instances leads to better overall agent generalization. In addition, when domain randomization and policy transfer fail, Active Domain Randomization offers more insight into the deficiencies of both the chosen parameter ranges and the learned policy, allowing for more focused debugging. Our experiments across various physics-based simulated and a real-robot task show that this enhancement leads to more robust, consistent policies.

本文就域随机化技术在代理泛化方面的影响进行了实证研究，提出了一种新颖的算法Active Domain Randomization，该算法学习参数采样策略，通过利用随机化和参考环境实例之间的策略汇聚差异来查找给定随机化范围内最具有信息量的环境变化，通过在这些实例上更频繁地训练，提高代理泛化的整体性能，实验结果表明在各种基于物理模拟和真实机器人任务中，该增强技术能够导致更强健、一致的策略。

主动域随机化