This work presents a multiscale framework to solve an inverse reinforcement learning (IRL) problem for continuous-time/state stochastic systems. We take advantage of a diffusion wavelet representation of the associated Markov chain to abstract the state space. This not only allows for effectively handling the large (and geometrically complex) decision space but also provides more interpretable representations of the demonstrated state trajectories and also of the resulting policy of IRL. In the proposed framework, the problem is divided into the global and local IRL, where the global approximation of the optimal value functions are obtained using coarse features and the local details are quantified using fine local features. An illustrative numerical example on robot path control in a complex environment is presented to verify the proposed method.

本研究提出了一个多尺度框架，用于解决连续时间/状态随机系统的逆强化学习问题。通过利用与其相关的马尔可夫链的扩散小波表示来对状态空间进行抽象，此框架可以有效地处理的大型（并且几何复杂）决策空间，同时提供更可解释的演示状态轨迹和逆强化学习策略的表征。此多尺度框架把问题分成全局和本地逆强化学习，其中全局逼近最优值函数是使用粗特征获得的，而本地细节是使用细小的局部特征来量化的。这篇论文给出了一个在复杂环境中的机器人路径控制的说明性数值示例，以验证所提出的方法。

使用扩散小波的多尺度逆强化学习