Learning reward functions for physical skills are challenging due to the vast
spectrum of skills, the high-dimensionality of state and action space, and
nuanced sensory feedback. The complexity of these tasks makes acquiring expert
demonstration data both costly and time-consuming. Large Language Models (LLMs)
contain valuable task-related knowledge that can aid in learning these reward
functions. However, the direct application of LLMs for proposing reward
functions has its limitations such as numerical instability and inability to
incorporate the environment feedback. We aim to extract task knowledge from
LLMs using environment feedback to create efficient reward functions for
physical skills. Our approach consists of two components. We first use the LLM
to propose features and parameterization of the reward function. Next, we
update the parameters of this proposed reward function through an iterative
self-alignment process. In particular, this process minimizes the ranking
inconsistency between the LLM and our learned reward functions based on the new
observations. We validated our method by testing it on three simulated physical
skill learning tasks, demonstrating effective support for our design choices.

使用大型语言模型通过环境反馈提取任务知识，为物理技能创建高效的奖励函数。方法包括利用语言模型提出奖励函数的特征和参数，然后通过迭代自对准过程更新这些参数，使语言模型与学习到的奖励函数的排名一致，通过在三个模拟物理技能学习任务中的测试验证了方法的有效性。

利用大型语言模型学习物理技能的奖励

Learning Reward for Physical Skills using Large Language Model

Large language models (LLMs) provide capabilities far beyond sentence
completion, including question answering, summarization, and natural-language
inference. While many of these capabilities have potential application to
cognitive systems, our research is exploiting language models as a source of
task knowledge for cognitive agents, that is, agents realized via a cognitive
architecture. We identify challenges and opportunities for using language
models as an external knowledge source for cognitive systems and possible ways
to improve the effectiveness of knowledge extraction by integrating extraction
with cognitive architecture capabilities, highlighting with examples from our
recent work in this area.

利用大型语言模型作为认知系统的任务知识、认知代理和知识提取，以及通过整合提取与认知架构能力来提高知识提取效果的挑战和机会。