We prove Wasserstein inverse reinforcement learning enables the learner's
reward values to imitate the expert's reward values in a finite iteration for
multi-objective optimizations. Moreover, we prove Wasserstein inverse
reinforcement learning enables the learner's optimal solutions to imitate the
expert's optimal solutions for multi-objective optimizations with lexicographic
order.

本文证明了 Wasserstein 反向强化学习可以在有限次迭代中实现学习者的奖励值模仿专家的奖励值，同时可在多目标优化中，实现学习者的最优解的字典序问题模仿专家的最优解。

Wasserstein 逆强化学习在多目标优化中的仿真证明

A proof of imitation of Wasserstein inverse reinforcement learning for  multi-objective optimization

We show the convergence of Wasserstein inverse reinforcement learning (WIRL)
for multi-objective optimizations with the projective subgradient method by
formulating an inverse problem of the optimization problem that is equivalent
to WIRL for multi-objective optimizations.
In addition, we prove convergence of inverse reinforcement learning (maximum
entropy inverse reinforcement learning, guid cost learning) for multi-objective
optimization with the projective subgradient method.

对于多目标优化问题，我们用射影次梯度方法展示了 Wasserstein 逆强化学习（WIRL）的收敛性，通过将优化问题的逆问题形式化为等价于多目标优化的 WIRL。此外，我们证明了逆强化学习（最大熵逆强化学习，引导成本学习）在使用射影次梯度方法解决多目标优化时的收敛性。