Bayesian filtering serves as the mainstream framework of state estimation in
dynamic systems. Its standard version utilizes total probability rule and
Bayes' law alternatively, where how to define and compute conditional
probability is critical to state distribution inference. Previously, the
conditional probability is assumed to be exactly known, which represents a
measure of the occurrence probability of one event, given the second event. In
this paper, we find that by adding an additional event that stipulates an
inequality condition, we can transform the conditional probability into a
special integration that is analogous to convolution. Based on this
transformation, we show that both transition probability and output probability
can be generalized to convolutional forms, resulting in a more general
filtering framework that we call convolutional Bayesian filtering. This new
framework encompasses standard Bayesian filtering as a special case when the
distance metric of the inequality condition is selected as Dirac delta
function. It also allows for a more nuanced consideration of model mismatch by
choosing different types of inequality conditions. For instance, when the
distance metric is defined in a distributional sense, the transition
probability and output probability can be approximated by simply rescaling them
into fractional powers. Under this framework, a robust version of Kalman filter
can be constructed by only altering the noise covariance matrix, while
maintaining the conjugate nature of Gaussian distributions. Finally, we
exemplify the effectiveness of our approach by reshaping classic filtering
algorithms into convolutional versions, including Kalman filter, extended
Kalman filter, unscented Kalman filter and particle filter.

通过引入一个附加事件来规定不等条件，我们将条件概率转换成类似于卷积的特殊积分，实现了传统贝叶斯滤波的更普适框架，称之为卷积贝叶斯滤波。该框架包含了标准贝叶斯滤波作为一种特殊情况，当不等条件的距离度量选取为狄拉克函数时。通过选择不同类型的不等条件，我们可以更全面地考虑模型不匹配问题。最后，我们通过将经典滤波算法改造成卷积版本的方式来验证我们的方法的有效性，其中包括了卡尔曼滤波器、扩展卡尔曼滤波器、无迹卡尔曼滤波器和粒子滤波器。

卷积贝叶斯滤波

Convolutional Bayesian Filtering

This paper develops the first policy gradient method with global optimality
guarantee and complexity analysis for robust reinforcement learning under model
mismatch. Robust reinforcement learning is to learn a policy robust to model
mismatch between simulator and real environment. We first develop the robust
policy (sub-)gradient, which is applicable for any differentiable parametric
policy class. We show that the proposed robust policy gradient method converges
to the global optimum asymptotically under direct policy parameterization. We
further develop a smoothed robust policy gradient method and show that to
achieve an $\epsilon$-global optimum, the complexity is $\mathcal
O(\epsilon^{-3})$. We then extend our methodology to the general model-free
setting and design the robust actor-critic method with differentiable
parametric policy class and value function. We further characterize its
asymptotic convergence and sample complexity under the tabular setting.
Finally, we provide simulation results to demonstrate the robustness of our
methods.

开发了具有全局最优性保证和复杂度分析的政策梯度方法，用于处理模型不匹配下的鲁棒强化学习，提出了鲁棒策略梯度和平滑的鲁棒策略梯度方法，并将方法推广到广泛的非模型设置下，提供了仿真结果证明了方法的鲁棒性。