BriefGPT.xyz
Aug, 2023
SILO语言模型:在非参数化数据存储中隔离法律风险
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
HTML
PDF
Sewon Min, Suchin Gururangan, Eric Wallace, Hannaneh Hajishirzi, Noah A. Smith...
TL;DR
通过在开放许可的文本语料库(OLC)上训练参数化的语言模型,并在推断过程中使用包含有版权内容的数据存储库,SILO模型能够在满足数据使用法规(如美国的公平使用原则和欧盟的GDPR)的同时提高模型的性能。
Abstract
The legality of
training
language models
(LMs) on copyrighted or otherwise restricted data is under intense debate. However, as we show,
model pe
→