This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.

本文介绍了一种基于复习核希尔伯特空间的近似Bellman映射类，该映射类对于所有折损因子的值都属于强力的希尔伯特非扩张映射家族，具备丰富的设计自由度，能够重现经典Bellman映射的属性，并为新型强化学习设计铺平道路。在提出的映射类基础上构建了一个近似策略迭代方案，用于解决在线选择“最佳”指数p的p-范数损失问题，以抵御线性自适应滤波中的异常值，而无需训练数据或关于异常值统计属性的任何知识。对合成数据的数值测试展示了该框架相较于几种非强化学习和基于核的强化学习方案的卓越性能。

强化学习的近端Bellman映射及其鲁棒自适应滤波应用