In this paper, we study the performance of variants of well-known
Convolutional Neural Network (CNN) architectures on different audio tasks. We
show that tuning the Receptive Field (RF) of CNNs is crucial to their
generalization. An insufficient RF limits the CNN's ability to fit the training
data. In contrast, CNNs with an excessive RF tend to over-fit the training data
and fail to generalize to unseen testing data. As state-of-the-art CNN
architectures-in computer vision and other domains-tend to go deeper in terms
of number of layers, their RF size increases and therefore they degrade in
performance in several audio classification and tagging tasks. We study
well-known CNN architectures and how their building blocks affect their
receptive field. We propose several systematic approaches to control the RF of
CNNs and systematically test the resulting architectures on different audio
classification and tagging tasks and datasets. The experiments show that
regularizing the RF of CNNs using our proposed approaches can drastically
improve the generalization of models, out-performing complex architectures and
pre-trained models on larger datasets. The proposed CNNs achieve
state-of-the-art results in multiple tasks, from acoustic scene classification
to emotion and theme detection in music to instrument recognition, as
demonstrated by top ranks in several pertinent challenges (DCASE, MediaEval).

本文研究了不同音频任务中已知卷积神经网络架构的变体性能。结果表明，调整卷积神经网络的感受野对其广义性至关重要。通过跨越多个音频分类和标记任务进行系统测试，我们提出了几种系统方法来控制 CNN 的 RF，并表明使用我们提出的方法对 CNN 的 RF 进行正则化可以显着提高模型的广义性，在多项任务中取得了最优结果。

使用深度卷积神经网络进行音频分类和标记的接收域正则化技术

Receptive Field Regularization Techniques for Audio Classification and  Tagging with Deep Convolutional Neural Networks

Variants dropout methods have been designed for the fully-connected layer,
convolutional layer and recurrent layer in neural networks, and shown to be
effective to avoid overfitting. As an appealing alternative to recurrent and
convolutional layers, the fully-connected self-attention layer surprisingly
lacks a specific dropout method. This paper explores the possibility of
regularizing the attention weights in Transformers to prevent different
contextualized feature vectors from co-adaption. Experiments on a wide range of
tasks show that DropAttention can improve performance and reduce overfitting.

探索在 Transformers 中规范化注意权重以防止过度拟合，并表明 DropAttention 能够提高性能并减少过度拟合。

DropAttention: 一种全连接自注意力网络的正则化方法

DropAttention: A Regularization Method for Fully-Connected  Self-Attention Networks

We study the topmost weight matrix of neural network language models. We show
that this matrix constitutes a valid word embedding. When training language
models, we recommend tying the input embedding and this output embedding. We
analyze the resulting update rules and show that the tied embedding evolves in
a more similar way to the output embedding than to the input embedding in the
untied model. We also offer a new method of regularizing the output embedding.
Our methods lead to a significant reduction in perplexity, as we are able to
show on a variety of neural network language models. Finally, we show that
weight tying can reduce the size of neural translation models to less than half
of their original size without harming their performance.

本文研究神经网络语言模型的最高权重矩阵，表明这个矩阵构成有效的单词嵌入，建议绑定输入嵌入和输出嵌入的训练方法并提供新的输出嵌入规则，这些方法能够显著降低困惑度并在不影响性能的情况下减少神经翻译模型的尺寸。

使用输出嵌入来改进语言模型

Using the Output Embedding to Improve Language Models

We propose zoneout, a novel method for regularizing RNNs. At each timestep,
zoneout stochastically forces some hidden units to maintain their previous
values. Like dropout, zoneout uses random noise to train a pseudo-ensemble,
improving generalization. But by preserving instead of dropping hidden units,
gradient information and state information are more readily propagated through
time, as in feedforward stochastic depth networks. We perform an empirical
investigation of various RNN regularizers, and find that zoneout gives
significant performance improvements across tasks. We achieve competitive
results with relatively simple models in character- and word-level language
modelling on the Penn Treebank and Text8 datasets, and combining with recurrent
batch normalization yields state-of-the-art results on permuted sequential
MNIST.

本文提出了一种名为 zoneout 的新型 RNN 正则化方法，其通过异步化一部分隐藏单元来提高模型的泛化能力，并在字符和词级语言建模的任务中获得了竞争性的结果。