Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with the use of log-likelihood in conjunction with gradient-based optimizers. First, we present a synthetic example illustrating how this approach can lead to very poor but stable parameter estimates. Second, we identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. Third, we present an alternative formulation, termed $\beta$-NLL, in which each data point's contribution to the loss is weighted by the $\beta$-exponentiated variance estimate. We show that using an appropriate $\beta$ largely mitigates the issue in our illustrative example. Fourth, we evaluate this approach on a range of domains and tasks and show that it achieves considerable improvements and performs more robustly concerning hyperparameters, both in predictive RMSE and log-likelihood criteria.

该论文对深度学习中估计异方差高斯分布参数的常见方法进行了研究，并提出了一种称为β-NLL的替代方法，该方法可以减轻基于梯度的优化器与对数似然函数损失函数一起使用时产生的困难。该替代方法在不同领域和任务中都表现出可观的改进和更具鲁棒性，这在预测RMSE和对数似然度量标准方面得到验证。

关于使用概率神经网络进行异方差不确定性估计的陷阱