Detecting successful behaviour is crucial for training intelligent agents. As
such, generalisable reward models are a prerequisite for agents that can learn
to generalise their behaviour. In this work we focus on developing robust
success detectors that leverage large, pretrained visio
在具体 AI 领域,利用大规模生成模型结合外部验证者,根据验证反馈逐步迭代推导最终解决方案,以验证是否达到说明中的目标条件,以便无缝整合到日常生活中,超越任务成功,和大范围的约束和个人偏好,为此构建一套测试基准,通过全面评估视觉与语言模型在识别视频中不良机器人行为方面的优点和失效模式,提供了有效利用模型评论的指导方针,并展示了将反馈融入政策改进的迭代过程的实用方法。