Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler...
TL;DR本文提出了 Deep Language Network (DLN) 架构,通过 prompt 优化和变分推理算法,实现了运用较小、较弱 LLM 实现高性能的语言模型,以及进行 few-shot learning 的相关研究。
Abstract
We view large language models (LLMs) as stochastic \emph{language layers} in a network, where the learnable parameters are the natural language \emph{prompts} at each layer. We stack two such layers, feeding the output of one layer to the next. We call the stacked architecture a \emph{