This report explores the theory that explains the high sparsity phenomenon
\citep{tosato2023emergent} observed in the forward-forward algorithm
\citep{hinton2022forward}. The two theorems proposed predict the sparsity
changes of a single data point's activation in two cases: Theorem
\ref{theorem:1}: Decrease the goodness of the whole batch. Theorem
\ref{theorem:2}: Apply the complete forward forward algorithm to decrease the
goodness for negative data and increase the goodness for positive data. The
theory aligns well with the experiments tested on the MNIST dataset.

本研究论文探讨解释了前向传播算法中观察到的高稀疏现象，并提出了两个定理来预测单个数据点的激活稀疏性变化，该理论与在 MNIST 数据集上进行的实验结果相吻合。

前向前向算法中稀疏性的理论

A theory for the sparsity emerged in the Forward Forward algorithm

The demand for efficient processing of deep neural networks (DNNs) on
embedded devices is a significant challenge limiting their deployment.
Exploiting sparsity in the network's feature maps is one of the ways to reduce
its inference latency. It is known that unstructured sparsity results in lower
accuracy degradation with respect to structured sparsity but the former needs
extensive inference engine changes to get latency benefits. To tackle this
challenge, we propose a solution to induce semi-structured activation sparsity
exploitable through minor runtime modifications. To attain high speedup levels
at inference time, we design a sparse training procedure with awareness of the
final position of the activations while computing the General Matrix
Multiplication (GEMM). We extensively evaluate the proposed solution across
various models for image classification and object detection tasks. Remarkably,
our approach yields a speed improvement of $1.25 \times$ with a minimal
accuracy drop of $1.1\%$ for the ResNet18 model on the ImageNet dataset.
Furthermore, when combined with a state-of-the-art structured pruning method,
the resulting models provide a good latency-accuracy trade-off, outperforming
models that solely employ structured pruning techniques.

通过在计算通用矩阵乘法（GEMM）时考虑激活的最终位置，我们设计了一种稀疏训练过程，以诱导可利用的半结构化激活稀疏性，并在图像分类和目标检测任务中对其进行了广泛评估，结果显示在 ImageNet 数据集上，我们的方法在 ResNet18 模型上实现了 1.25 倍的加速，并仅有 1.1% 的最小精度降低；另外，与先进的结构化剪枝方法相结合，得到的模型在延迟和准确性之间取得了很好的平衡，优于仅采用结构化剪枝技术的模型。

通过半结构化激活稀疏化加速深度神经网络

Accelerating Deep Neural Networks via Semi-Structured Activation  Sparsity

The study of adversarial examples and their activation has attracted
significant attention for secure and robust learning with deep neural networks
(DNNs). Different from existing works, in this paper, we highlight two new
characteristics of adversarial examples from the channel-wise activation
perspective: 1) the activation magnitudes of adversarial examples are higher
than that of natural examples; and 2) the channels are activated more uniformly
by adversarial examples than natural examples. We find that the
state-of-the-art defense adversarial training has addressed the first issue of
high activation magnitudes via training on adversarial examples, while the
second issue of uniform activation remains. This motivates us to suppress
redundant activation from being activated by adversarial perturbations via a
Channel-wise Activation Suppressing (CAS) strategy. We show that CAS can train
a model that inherently suppresses adversarial activation, and can be easily
applied to existing defense methods to further improve their robustness. Our
work provides a simple but generic training strategy for robustifying the
intermediate layer activation of DNNs.

本文从渠道角度分析对抗 perturbations 特性，提出了一种基于通道的抑制策略 (CAS) 用于训练一个本质上抑制对抗激活的模型，进一步改善现有防御方法的鲁棒性。

通过通道级激活抑制来提高对抗性鲁棒性

Improving Adversarial Robustness via Channel-wise Activation Suppressing

The purported "black box" nature of neural networks is a barrier to adoption
in applications where interpretability is essential. Here we present DeepLIFT
(Deep Learning Important FeaTures), a method for decomposing the output
prediction of a neural network on a specific input by backpropagating the
contributions of all neurons in the network to every feature of the input.
DeepLIFT compares the activation of each neuron to its 'reference activation'
and assigns contribution scores according to the difference. By optionally
giving separate consideration to positive and negative contributions, DeepLIFT
can also reveal dependencies which are missed by other approaches. Scores can
be computed efficiently in a single backward pass. We apply DeepLIFT to models
trained on MNIST and simulated genomic data, and show significant advantages
over gradient-based methods. Video tutorial: this http URL, ICML slides:
bit.ly/deeplifticmlslides, ICML talk: this https URL, code:
this http URL

DeepLIFT 是一种通过反向传播将神经网络对特定输入的输出预测分解为网络中所有神经元对每个输入特征的贡献的方法，可用于提高网络解释性并有助于揭示其他方法中可能被忽略的依赖关系。