During a long period of time we are combating over-fitting in the CNN
training process with model regularization, including weight decay, model
averaging, data augmentation, etc. In this paper, we present DisturbLabel, an
extremely simple algorithm which randomly replaces a part of labels as
incorrect values in each iteration. Although it seems weird to intentionally
generate incorrect training labels, we show that DisturbLabel prevents the
network training from over-fitting by implicitly averaging over exponentially
many networks which are trained with different label sets. To the best of our
knowledge, DisturbLabel serves as the first work which adds noises on the loss
layer. Meanwhile, DisturbLabel cooperates well with Dropout to provide
complementary regularization functions. Experiments demonstrate competitive
recognition results on several popular image recognition datasets.

本文提出了 DisturbLabel 算法，通过在每次迭代中随机替换部分标签为不正确的值，使神经网络模型训练不会出现过拟合，并在几个流行的图像识别数据集上展示了有竞争力的识别结果。

DisturbLabel: 在损失层上对 CNN 进行正则化

DisturbLabel: Regularizing CNN on the Loss Layer

Convolutional neural networks (CNNs) work well on large datasets. But
labelled data is hard to collect, and in some applications larger amounts of
data are not available. The problem then is how to use CNNs with small data --
as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better
robustness to over-fitting on small data than traditional approaches. This is
by placing a probability distribution over the CNN's kernels. We approximate
our model's intractable posterior with Bernoulli variational distributions,
requiring no additional model parameters.
On the theoretical side, we cast dropout network training as approximate
inference in Bayesian neural networks. This allows us to implement our model
using existing tools in deep learning with no increase in time complexity,
while highlighting a negative result in the field. We show a considerable
improvement in classification accuracy compared to standard techniques and
improve on published state-of-the-art results for CIFAR-10.

通过在 CNN 的内核上建立概率分布，使用伯努利变分分布来近似模型的不可切合后验，并将 dropout 网络训练视为 Bayesian 神经网络中的近似推理。相比于标准技术，我们的模型在小数据上具备更好的鲁棒性，并在 CIFAR-10 上的分类准确率上取得了发表的最新结果的显著改善。

具有伯努利近似变分推断的贝叶斯卷积神经网络

Bayesian Convolutional Neural Networks with Bernoulli Approximate  Variational Inference

The Hierarchical Mixture of Experts (HME) is a well-known tree-based model
for regression and classification, based on soft probabilistic splits. In its
original formulation it was trained by maximum likelihood, and is therefore
prone to over-fitting. Furthermore the maximum likelihood framework offers no
natural metric for optimizing the complexity and structure of the tree.
Previous attempts to provide a Bayesian treatment of the HME model have relied
either on ad-hoc local Gaussian approximations or have dealt with related
models representing the joint distribution of both input and output variables.
In this paper we describe a fully Bayesian treatment of the HME model based on
variational inference. By combining local and global variational methods we
obtain a rigourous lower bound on the marginal probability of the data under
the model. This bound is optimized during the training phase, and its resulting
value can be used for model order selection. We present results using this
approach for a data set describing robot arm kinematics.

本文提出了一种基于变分推断的 HME 模型的全贝叶斯处理方法，并通过将局部和全局变分方法相结合获得了对模型下数据边际概率的严格下界。