BriefGPT.xyz
Apr, 2021
使用预训练语言模型生成数据集
Generating Datasets with Pretrained Language Models
HTML
PDF
Timo Schick, Hinrich Schütze
TL;DR
本文介绍了一种利用预训练语言模型生成标注文本数据集的方法,从而实现高质量的无监督学习得到的句子嵌入。实验结果表明,这种方法在多个语义文本相似性测试数据上实现了比较好的性能表现。
Abstract
To obtain high-quality
sentence embeddings
from
pretrained language models
, they must either be augmented with additional pretraining objectives or finetuned on large amounts of labeled text pairs. While the latt
→