Training novice users to operate an excavator for learning different skills
requires the presence of expert teachers. Considering the complexity of the
problem, it is comparatively expensive to find skilled experts as the process
is time-consuming and requires precise focus. Moreover, since humans tend to be
biased, the evaluation process is noisy and will l
提出 MEDAL ++ 算法,利用少量的专家演示,在无需人工监督或监管的情况下,通过同时学习任务和任务的反向操作,自主地练习任务,从演示中推断出奖励函数,并从高维视觉输入端到端学习策略和奖励函数。在模拟和真实机器人实验中,MEDAL++ 都表现出了优秀的表现,证明其比基于视觉的现有方法更具数据效率且表现更优秀。