Language-conditioned robot behavior plays a vital role in executing complex tasks by associating human commands or instructions with perception and actions. The ability to compose long-horizon tasks based on unconstrained language instructions necessitates the acquisition of a diverse set of general-purpose skills. However, acquiring inherent primitive skills in a coupled and long-horizon environment without external rewards or human supervision presents significant challenges. In this paper, we evaluate the relationship between skills and language instructions from a mathematical perspective, employing two forms of mutual information within the framework of language-conditioned policy learning. To maximize the mutual information between language and skills in an unsupervised manner, we propose an end-to-end imitation learning approach known as Language Conditioned Skill Discovery (LCSD). Specifically, we utilize vector quantization to learn discrete latent skills and leverage skill sequences of trajectories to reconstruct high-level semantic instructions. Through extensive experiments on language-conditioned robotic navigation and manipulation tasks, encompassing BabyAI, LORel, and CALVIN, we demonstrate the superiority of our method over prior works. Our approach exhibits enhanced generalization capabilities towards unseen tasks, improved skill interpretability, and notably higher rates of task completion success.

通过数学方法评估语言条件化策略学习框架中技能与语言指令之间的关系，提出了一种称为语言条件化技能发现（LCSD）的端到端模仿学习方法，通过最大化语言和技能之间的互信息，在无监督的情况下学习离散潜在技能并利用技能序列重构高级语义指令，通过在BabyAI、LORel和CALVIN上进行大量实验，展示了我们方法在语言条件化的机器人导航和操作任务中的优越性，包括对未知任务的增强泛化能力、改进的技能可解释性和显著提高的任务完成成功率。

关于基于语言条件技能发现的互信息思考与模仿学习