BriefGPT.xyz
Dec, 2021
避免灾难性遗忘的有条件语言模型控制
Controlling Conditional Language Models with Distributional Policy Gradients
HTML
PDF
Tomasz Korbak, Hady Elsahar, German Kruszewski, Marc Dymetman
TL;DR
该论文探讨了如何利用能量基模型(EBMs)来实现fine-tuning,提出了条件分布策略梯度(CDPG)用于解决基于条件任务的fine-tuning,结果显示CDPG确实可以在不毁掉预训练模型通用能力的情况下,帮助模型更好地适应特定任务的需求。
Abstract
machine learning
is shifting towards general-purpose
pretrained generative models
, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However
→