Large-scale generative models are shown to be useful for sampling meaningful
candidate solutions, yet they often overlook task constraints and user
preferences. Their full power is better harnessed when the models are coupled
with external verifiers and the final solutions are derived iteratively or
progressively according to the verification feedback. In the context of
embodied AI, verification often solely involves assessing whether goal
conditions specified in the instructions have been met. Nonetheless, for these
agents to be seamlessly integrated into daily life, it is crucial to account
for a broader range of constraints and preferences beyond bare task success
(e.g., a robot should grasp bread with care to avoid significant deformations).
However, given the unbounded scope of robot tasks, it is infeasible to
construct scripted verifiers akin to those used for explicit-knowledge tasks
like the game of Go and theorem proving. This begs the question: when no sound
verifier is available, can we use large vision and language models (VLMs),
which are approximately omniscient, as scalable Behavior Critics to catch
undesirable robot behaviors in videos? To answer this, we first construct a
benchmark that contains diverse cases of goal-reaching yet undesirable robot
policies. Then, we comprehensively evaluate VLM critics to gain a deeper
understanding of their strengths and failure modes. Based on the evaluation, we
provide guidelines on how to effectively utilize VLM critiques and showcase a
practical way to integrate the feedback into an iterative process of policy
refinement. The dataset and codebase are released at:
this https URL

在具体 AI 领域，利用大规模生成模型结合外部验证者，根据验证反馈逐步迭代推导最终解决方案，以验证是否达到说明中的目标条件，以便无缝整合到日常生活中，超越任务成功，和大范围的约束和个人偏好，为此构建一套测试基准，通过全面评估视觉与语言模型在识别视频中不良机器人行为方面的优点和失效模式，提供了有效利用模型评论的指导方针，并展示了将反馈融入政策改进的迭代过程的实用方法。

任务成功并不足够：调查使用视频 - 语言模型作为行为批评家以捕捉不良代理行为

"Task Success" is not Enough: Investigating the Use of Video-Language  Models as Behavior Critics for Catching Undesirable Agent Behaviors

Tasks where the set of possible actions depend discontinuously on the state
pose a significant challenge for current reinforcement learning algorithms. For
example, a locked door must be first unlocked, and then the handle turned
before the door can be opened. The sequential nature of these tasks makes
obtaining final rewards difficult, and transferring information between task
variants using continuous learned values such as weights rather than discrete
symbols can be inefficient. Our key insight is that agents that act and think
symbolically are often more effective in dealing with these tasks. We propose a
memory-based learning approach that leverages the symbolic nature of
constraints and temporal ordering of actions in these tasks to quickly acquire
and transfer high-level information. We evaluate the performance of
memory-based learning on both real and simulated tasks with approximately
discontinuous constraints between states and actions, and show our method
learns to solve these tasks an order of magnitude faster than both model-based
and model-free deep reinforcement learning methods.

使用基于记忆的学习方法，利用任务的符号特性和动作的时间顺序，快速获取和传输高级信息，以解决不连续性约束的任务，这种代理方式比基于模型和无模型深度强化学习方法解决这些任务更快。

使用基于记忆的学习来解决具有状态 - 动作约束的任务

Using Memory-Based Learning to Solve Tasks with State-Action Constraints

Sampling-based motion planning under task constraints is challenging because
the null-measure constraint manifold in the configuration space makes rejection
sampling extremely inefficient, if not impossible. This paper presents a
learning-based sampling strategy for constrained motion planning problems. We
investigate the use of two well-known deep generative models, the Conditional
Variational Autoencoder (CVAE) and the Conditional Generative Adversarial Net
(CGAN), to generate constraint-satisfying sample configurations. Instead of
precomputed graphs, we use generative models conditioned on constraint
parameters for approximating the constraint manifold. This approach allows for
the efficient drawing of constraint-satisfying samples online without any need
for modification of available sampling-based motion planning algorithms. We
evaluate the efficiency of these two generative models in terms of their
sampling accuracy and coverage of sampling distribution. Simulations and
experiments are also conducted for different constraint tasks on two robotic
platforms.

该论文介绍了一种基于深度生成模型的采样策略，以解决受任务约束条件下的运动规划问题。研究使用两种深度生成模型 CVAE 和 CGAN 来生成满足约束条件的样本配置，并通过模拟和实验评估其采样准确性和采样分布的覆盖率。