The learning of deep models, in which a numerous of parameters are superimposed, is known to be a fairly sensitive process and should be carefully done through a combination of several techniques that can help to stabilize it. We introduce an additional challenge that has never been explicitly studied: the heterogeneity of sparsity at the instance level due to missing values or the innate nature of the input distribution. We confirm experimentally on the widely used benchmark datasets that this variable sparsity problem makes the output statistics of neurons unstable and makes the learning process more difficult by saturating non-linearities. We also provide the analysis of this phenomenon, and based on our analysis, we present a simple technique to prevent this issue, referred to as Sparsity Normalization (SN). Finally, we show that the performance can be significantly improved with SN on certain popular benchmark datasets, or that similar performance can be achieved with lower capacity. Especially focusing on the collaborative filtering problem where the variable sparsity issue has been completely ignored, we achieve new state-of-the-art results on Movielens 100k and 1M datasets, by simply applying Sparsity Normalization (SN).

本研究提出变量稀疏问题，并通过实验证明了零值归零处理方法在处理缺失数据时性能不佳，提出了一种名为 Sparsity Normalization 的技术，该技术直接解决了变量稀疏问题，并在多个基准数据集上得到了实验验证。

神经网络训练中的稀疏偏差校正：为何不使用0填充？