TL;DR本文提出了高性能的神经网络激活函数 ——Gaussian Error Linear Unit(GELU),它的非线性性能优于 ReLU 和 ELU,并在所有涉及的计算机视觉、自然语言处理和语音任务中均实现了性能提升。
Abstract
We propose the gaussian error linear unit (GELU), a high-performing neural
network activation function. The GELU activation function is $x\Phi(x)$, where
$\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU
nonlinearity weights inputs by their value, rather than g