We explore recently proposed variational dropout technique which provided an elegant Bayesian interpretation to dropout. We extend variational dropout to the case when dropout rate is unknown and show that it can be found by optimizing evidence variational lower bound. We show that it is possible to assign and find individual dropout rates to each connection in DNN. Interestingly such assignment leads to extremely sparse solutions both in fully-connected and convolutional layers. This effect is similar to automatic relevance determination (ARD) effect in empirical Bayes but has a number of advantages. We report up to 128 fold compression of popular architectures without a large loss of accuracy providing additional evidence to the fact that modern deep architectures are very redundant.

本研究采用变分丢失技术，提供了一种优雅的高斯丢失的贝叶斯解释，将其扩展到丢失速率无界的情况，提出一种减少梯度估计器方差的方法，并在每个权重的情况下报告第一个实验结果。有趣的是，在完全连接和卷积层中都导致极度稀疏的解决方案。这种效应类似于实证贝叶斯中的自动相关确定效应，但具有许多优势。我们在LeNet架构上将参数减少了最多280倍，并在类似VGG的网络上将参数减少了最多68倍，同时准确度几乎不减。

变分Dropout稀疏深度神经网络