Rectified Linear Units (ReLUs) have been shown to ameliorate the vanishing gradient problem, allow for efficient back-propagation, and empirically promote sparsity in the learned parameters. Their use has led to state-of-the-art results in a variety of applications. In this paper, we characterize the expressiveness of ReLU networks. From this perspective, unlike the sign (threshold) and sigmoid activations, ReLU networks are less explored. We show that, while the decision boundary of a two-layer ReLU network can be captured by a sign network, the sign network can require an exponentially larger number of hidden units. Furthermore, we formulate the sufficient conditions for a corresponding logarithmic reduction in the number of hidden units to represent a sign network as a ReLU network. Finally, using synthetic data, we experimentally demonstrate that back propagation can recover the much smaller ReLU networks as predicted by the theory.

研究表明，修正线性单元（ReLU）不仅可以改善梯度消失问题、实现高效反向传播，且在学习参数方面具有稀疏性；本文则从表现力的角度探究了ReLU网络的决策边界，并实验证明两层ReLU网络的决策边界可以被阈值网络广泛捕捉，而后者可能需要一个指数级别的更多的隐藏单元。此外，本文还提出了系数条件，将符号网络表示为ReLU网络的隐藏单元数量可以倍减。最后，作者通过对一些合成数据进行实验比较了ReLU网络和阈值网络及它们较小的ReLU网络的学习能力。

整流网络的表现力