Probabilistic verification of neural networks is concerned with formally
analysing the output distribution of a neural network under a probability
distribution of the inputs. Examples of probabilistic verification include
verifying the demographic parity fairness notion or quantifying the safety of a
neural network. We present a new algorithm for the probabilistic verification
of neural networks based on an algorithm for computing and iteratively refining
lower and upper bounds on probabilities over the outputs of a neural network.
By applying state-of-the-art bound propagation and branch and bound techniques
from non-probabilistic neural network verification, our algorithm significantly
outpaces existing probabilistic verification algorithms, reducing solving times
for various benchmarks from the literature from tens of minutes to tens of
seconds. Furthermore, our algorithm compares favourably even to dedicated
algorithms for restricted subsets of probabilistic verification. We complement
our empirical evaluation with a theoretical analysis, proving that our
algorithm is sound and, under mildly restrictive conditions, also complete when
using a suitable set of heuristics.

用具有适当启发式条件的一组算法基于神经网络的输出分布来验证其概率，同时计算和迭代优化神经网络输出概率的下界和上界，并通过应用非概率性神经网络验证中的最先进的边界传播和分支约束技术，显著提高了解决时间。

使用分支界限法对神经网络进行概率验证

Probabilistic Verification of Neural Networks using Branch and Bound

The success of Reinforcement Learning from Human Feedback (RLHF) in language
model alignment is critically dependent on the capability of the reward model
(RM). However, as the training process progresses, the output distribution of
the policy model shifts, leading to the RM's reduced ability to distinguish
between responses. This issue is further compounded when the RM, trained on a
specific data distribution, struggles to generalize to examples outside of that
distribution. These two issues can be united as a challenge posed by the
shifted distribution of the environment. To surmount this challenge, we
introduce MetaRM, a method leveraging meta-learning to align the RM with the
shifted environment distribution. MetaRM is designed to train the RM by
minimizing data loss, particularly for data that can improve the
differentiation ability to examples of the shifted target distribution.
Extensive experiments demonstrate that MetaRM significantly improves the RM's
distinguishing ability in iterative RLHF optimization, and also provides the
capacity to identify subtle differences in out-of-distribution samples.

利用元学习来解决环境分布变化引起的强化学习中奖励模型难以区分响应以及难以泛化到新例子的问题。

MetaRM: 通过元学习实现偏移分布对齐

MetaRM: Shifted Distributions Alignment via Meta-Learning

Is In-Context Learning (ICL) implicitly equivalent to Gradient Descent (GD)?
Several recent works draw analogies between the dynamics of GD and the emergent
behavior of ICL in large language models. However, these works make assumptions
far from the realistic natural language setting in which language models are
trained. Such discrepancies between theory and practice, therefore, necessitate
further investigation to validate their applicability.
We start by highlighting the weaknesses in prior works that construct
Transformer weights to simulate gradient descent. Their experiments with
training Transformers on ICL objective, inconsistencies in the order
sensitivity of ICL and GD, sparsity of the constructed weights, and sensitivity
to parameter changes are some examples of a mismatch from the real-world
setting.
Furthermore, we probe and compare the ICL vs. GD hypothesis in a natural
setting. We conduct comprehensive empirical analyses on language models
pretrained on natural data (LLaMa-7B). Our comparisons on various performance
metrics highlight the inconsistent behavior of ICL and GD as a function of
various factors such as datasets, models, and number of demonstrations. We
observe that ICL and GD adapt the output distribution of language models
differently. These results indicate that the equivalence between ICL and GD is
an open hypothesis, requires nuanced considerations and calls for further
studies.

在实际的自然语言环境中，对比了 In-Context Learning (ICL) 和 Gradient Descent (GD) 在语言模型上的表现差异，发现二者在适应语言模型的输出分布上存在不一致的行为。

预训练的 Transformer 是否真的通过梯度下降来学习上下文？

Do pretrained Transformers Really Learn In-context by Gradient Descent?

The output distribution of a neural network (NN) over the entire input space
captures the complete input-output mapping relationship, offering insights
toward a more comprehensive NN understanding. Exhaustive enumeration or
traditional Monte Carlo methods for the entire input space can exhibit
impractical sampling time, especially for high-dimensional inputs. To make such
difficult sampling computationally feasible, in this paper, we propose a novel
Gradient-based Wang-Landau (GWL) sampler. We first draw the connection between
the output distribution of a NN and the density of states (DOS) of a physical
system. Then, we renovate the classic sampler for the DOS problem, the
Wang-Landau algorithm, by replacing its random proposals with gradient-based
Monte Carlo proposals. This way, our GWL sampler investigates the
under-explored subsets of the input space much more efficiently. Extensive
experiments have verified the accuracy of the output distribution generated by
GWL and also showcased several interesting findings - for example, in a binary
image classification task, both CNN and ResNet mapped the majority of human
unrecognizable images to very negative logit values.

本文提出了梯度基于的 Wang-Landau 采样算法，以更高效地探索神经网络的输出分布于输入空间之间的关系，实验结果在二元图像分类任务中，CNN 和 ResNet 把大多数不可识别的图片映射至负 logit 值。