State-of-the-art parameter-efficient fine-tuning methods rely on introducing
adapter modules between the layers of a pretrained language model. However,
such modules are trained separately for each task and thus do not enable
sharing information across tasks. In this paper, we show that we can learn
adapter parameters for all layers and tasks by generating them using shared
hypernetworks, which condition on task, adapter position, and layer id in a
transformer model. This parameter-efficient multi-task learning framework
allows us to achieve the best of both worlds by sharing knowledge across tasks
via hypernetworks while enabling the model to adapt to each individual task
through task-specific adapters. Experiments on the well-known GLUE benchmark
show improved performance in multi-task learning while adding only 0.29%
parameters per task. We additionally demonstrate substantial performance
improvements in few-shot domain generalization across a variety of tasks. Our
code is publicly available in this https URL

本文提出了一种通过使用共享的超网络生成适配器参数来学习所有层和任务的参数高效的多任务学习框架，从而在跨任务共享知识的同时，通过任务特定的适配器使模型适应每个单独的任务，并在已知的 GLUE 基准测试中实现了多任务学习的改进性能。

共享超网络的 Transformer 多任务微调的参数高效方法

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared  Hypernetworks

Recent advances in multilingual dependency parsing have brought the idea of a
truly universal parser closer to reality. However, cross-language interference
and restrained model capacity remain major obstacles. To address this, we
propose a novel multilingual task adaptation approach based on contextual
parameter generation and adapter modules. This approach enables to learn
adapters via language embeddings while sharing model parameters across
languages. It also allows for an easy but effective integration of existing
linguistic typology features into the parsing network. The resulting parser,
UDapter, outperforms strong monolingual and multilingual baselines on the
majority of both high-resource and low-resource (zero-shot) languages, showing
the success of the proposed adaptation approach. Our in-depth analyses show
that soft parameter sharing via typological features is key to this success.

该研究提出了一种基于文本参数生成和适配器模块的新型多语言任务适应方法，该方法通过语言嵌入来学习适配器，同时在各种语言之间共享模型参数，可有效地集成现有的语言类型学特征到分析网络中，并在高资源和低资源语言中均显着优于强的单语言和多语言基线，这表明了所提出的适应方法的成功。

UDapter：面向真正通用依存句法分析的语言适应

UDapter: Language Adaptation for Truly Universal Dependency Parsing

Fine-tuning large pre-trained models is an effective transfer mechanism in
NLP. However, in the presence of many downstream tasks, fine-tuning is
parameter inefficient: an entire new model is required for every task. As an
alternative, we propose transfer with adapter modules. Adapter modules yield a
compact and extensible model; they add only a few trainable parameters per
task, and new tasks can be added without revisiting previous ones. The
parameters of the original network remain fixed, yielding a high degree of
parameter sharing. To demonstrate adapter's effectiveness, we transfer the
recently proposed BERT Transformer model to 26 diverse text classification
tasks, including the GLUE benchmark. Adapters attain near state-of-the-art
performance, whilst adding only a few parameters per task. On GLUE, we attain
within 0.4% of the performance of full fine-tuning, adding only 3.6% parameters
per task. By contrast, fine-tuning trains 100% of the parameters per task.

通过适配器模块实现神经网络参数共享，避免针对每个任务都需要重新训练整个神经网络的问题。将适配器模块应用于 BERT Transformer 可以达到接近完全微调的性能，同时每个任务只需增加 3.6％的可训练参数，表现十分出色。