This work identifies a simple pre-training mechanism that leads to
representations exhibiting better continual and transfer learning. This
mechanism -- the repeated resetting of weights in the last layer, which we
nickname "zapping" -- was originally designed for a meta-continual-learning
procedure, yet we show it is surprisingly applicable in many settings beyond
both meta-learning and continual learning. In our experiments, we wish to
transfer a pre-trained image classifier to a new set of classes, in a few
shots. We show that our zapping procedure results in improved transfer accuracy
and/or more rapid adaptation in both standard fine-tuning and continual
learning settings, while being simple to implement and computationally
efficient. In many cases, we achieve performance on par with state of the art
meta-learning without needing the expensive higher-order gradients, by using a
combination of zapping and sequential learning. An intuitive explanation for
the effectiveness of this zapping procedure is that representations trained
with repeated zapping learn features that are capable of rapidly adapting to
newly initialized classifiers. Such an approach may be considered a
computationally cheaper type of, or alternative to, meta-learning rapidly
adaptable features with higher-order gradients. This adds to recent work on the
usefulness of resetting neural network parameters during training, and invites
further investigation of this mechanism.

通过实验验证，本文发现通过反复重置最后一层的权重（即 “zapping”）的简单预训练机制可以提高迁移学习和持续学习的性能，这一机制在许多领域都适用，并且在计算上高效简单。

重新设置并忘记它：重新学习最后一层的权重提高持续和迁移学习

Reset It and Forget It: Relearning Last-Layer Weights Improves Continual  and Transfer Learning

Existing sequential recommendation methods rely on large amounts of training
data and usually suffer from the data sparsity problem. To tackle this, the
pre-training mechanism has been widely adopted, which attempts to leverage
large-scale data to perform self-supervised learning and transfer the
pre-trained parameters to downstream tasks. However, previous pre-trained
models for recommendation focus on leverage universal sequence patterns from
user behaviour sequences and item information, whereas ignore capturing
personalized interests with the heterogeneous user information, which has been
shown effective in contributing to personalized recommendation. In this paper,
we propose a method to enhance pre-trained models with heterogeneous user
information, called User-aware Pre-training for Recommendation (UPRec).
Specifically, UPRec leverages the user attributes andstructured social graphs
to construct self-supervised objectives in the pre-training stage and proposes
two user-aware pre-training tasks. Comprehensive experimental results on
several real-world large-scale recommendation datasets demonstrate that UPRec
can effectively integrate user information into pre-trained models and thus
provide more appropriate recommendations for users.

本文提出了一种名为 UPRec 的方法，通过构建自监督任务来利用用户属性和结构化社交图形，并将用户信息整合到预训练模型中，为用户提供更合适的推荐。

UPRec: 面向用户的推荐系统预训练

UPRec: User-Aware Pre-training for Recommender Systems

We investigate two specific manifestations of compositionality in Neural
Machine Translation (NMT) : (1) Productivity - the ability of the model to
extend its predictions beyond the observed length in training data and (2)
Systematicity - the ability of the model to systematically recombine known
parts and rules. We evaluate a standard Sequence to Sequence model on tests
designed to assess these two properties in NMT. We quantitatively demonstrate
that inadequate temporal processing, in the form of poor encoder
representations is a bottleneck for both Productivity and Systematicity. We
propose a simple pre-training mechanism which alleviates model performance on
the two properties and leads to a significant improvement in BLEU scores.

本研究探讨了神经机器翻译中组合性的两个具体表现：生产力和系统性，并通过一种简单的预训练机制减轻了编码器的表示不足而显著提高了 BLEU 分数。