Deep neural networks (DNNs) have set state of the art results in many machine learning and NLP tasks. However, we do not have a strong understanding of what DNN models learn. In this paper, we examine learning in DNNs through analysis of their outputs. We compare DNN performance directly to a human population, and use characteristics of individual data points such as difficulty to see how well models perform on easy and hard examples. We investigate how training size and the incorporation of noise affect a DNN's ability to generalize and learn. Our experiments show that unlike traditional machine learning models (e.g., Naive Bayes, Decision Trees), DNNs exhibit human-like learning properties. As they are trained with more data, they are more able to distinguish between easy and difficult items, and performance on easy items improves at a higher rate than difficult items. We find that different DNN models exhibit different strengths in learning and are robust to noise in training data.

研究了深度学习模型性能评估中忽略的数据点特征和难度对测试集准确性的影响，通过用已有的心理测量学方法对人类的反应模式进行建模来估计难度，实验结果发现难度对于测试的结果有重要影响，同时易于学习的实例被模型学得更快。

通过考察测试集难度理解深度学习性能：一项心理测量案例研究