在球面上无监督地发现连续技能

May, 2023

Unsupervised Discovery of Continuous Skills on a Sphere

Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

TL;DR本文提出了一种称为DISCS的学习方法，通过最大化技能和状态间的互信息，学习一种可能的无数不同技能，其中每一个技能对应于球面上的连续值，并且通过在MuJoCo Ant机器人控制环境中的实验显示，DISCS可以比其他方法学习到更多元化的技能。

Abstract

Recently, methods for learning diverse skills to generate various behaviors without external rewards have been actively studied as a form of unsupervised reinforcement learning. However, most of the existing meth