Jun, 2024
MEReQ:最大熵残差 Q 逆强化学习用于样本高效对齐
MEReQ: Max-Ent Residual-Q Inverse RL for Sample-Efficient Alignment from Intervention
Yuxin Chen, Chen Tang, Chenran Li, Ran Tian, Peter Stone...
TL;DR利用 MEReQ(Maximum-Entropy Residual-Q Inverse Reinforcement Learning) 方法,可以通过人类介入进行样本高效的策略对齐。