In this study, we introduce a novel visual imitation network with a spatial attention module for robotic assisted feeding (RAF). The goal is to acquire (i.e., scoop) food items from a bowl. However, achieving robust and adaptive food manipulation is particularly challenging. To deal with this, we propose a framework that integrates visual perception with imitation learning to enable the robot to handle diverse scenarios during scooping. Our approach, named AVIL (adaptive visual imitation learning), exhibits adaptability and robustness across different bowl configurations in terms of material, size, and position, as well as diverse food types including granular, semi-solid, and liquid, even in the presence of distractors. We validate the effectiveness of our approach by conducting experiments on a real robot. We also compare its performance with a baseline. The results demonstrate improvement over the baseline across all scenarios, with an enhancement of up to 2.5 times in terms of a success metric. Notably, our model, trained solely on data from a transparent glass bowl containing granular cereals, showcases generalization ability when tested zero-shot on other bowl configurations with different types of food.

通过将视觉感知与模仿学习相结合，我们提出了一种新颖的具有空间注意力模块的视觉模仿网络，用于机器人辅助喂食。我们的方法名为AVIL（自适应视觉模仿学习），在处理不同装碗、不同材质、尺寸、位置以及不同类型的食物，甚至在存在干扰物的多样情景时表现出可适应性和鲁棒性。我们在真实机器人上进行了实验证明了我们方法的有效性，并与一个基准模型进行了比较。结果表明，在所有情景中，我们的模型取得了比基准模型多达2.5倍的成功度量改善。值得注意的是，我们的模型仅在装有颗粒谷物的透明玻璃碗的数据上进行训练，但在其他不同类型的食物和碗配置上进行零样本测试时表现出泛化能力。

适应性视觉模仿学习用于跨多样碗配置和食物类型的机器人辅助喂食