BriefGPT.xyz
May, 2022
DOMiNO: 多样性优化,保持接近最优的发现策略
Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality
HTML
PDF
Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag...
TL;DR
该论文提出了DOMiNO方法用于强化学习中多样性和优化的平衡,通过约束马尔可夫决策过程找到不同的策略,能够发现具有意义的多种行为并且对干扰有很强的鲁棒性。
Abstract
Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In
reinforcement learning
, a set of
diverse policies
can be useful fo
→