Besides independent learning, human learning process is highly improved by
summarizing what has been learned, communicating it with peers, and
subsequently fusing knowledge from different sources to assist the current
learning goal. This collaborative learning procedure ensures that th
本文提出了一种新的算法,名为 Learning to Coordinate and Teach Reinforcement(LeCTR),通过在协作多智能体强化学习中使每个代理都学习何时提供何种建议,从而改善整个团队性能和学习效果。实证比较表明,我们的教学代理不仅学习速度更快,而且在现有方法失败的任务中也学会了协作。