使用神经网络奖励函数的开放式强化学习

Feb, 2022

使用神经网络奖励函数的开放式强化学习

Open-Ended Reinforcement Learning with Neural Reward Functions

Robert Meier, Asier Mujika

TL;DR该研究提出了一种使用神经网络编码奖励函数的方法，通过迭代训练，以鼓励更复杂的行为，实现在高维度机器人和像素级环境下的无监督学习，从而学习包括前空翻和单腿奔跑等丰富的技能。

Abstract

Inspired by the great success of unsupervised learning in Computer Vision and Natural Language Processing, the reinforcement learning community has recently started to focus more on unsupervised discovery of skil