BriefGPT.xyz
Oct, 2021
CCQA:用于模型预训练的新型 Web 规模问答数据集
CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training
HTML
PDF
Patrick Huber, Armen Aghajanyan, Barlas Oğuz, Dmytro Okhonko, Wen-tau Yih...
TL;DR
本文提出一种在Common Crawl项目的基础上,使用大规模、自然、多样化问答数据集进行领域内预训练的方法,该方法可以用于open-domain question-answering任务中的零样本、低资源和微调设置,展示了预训练在该任务中的潜力。
Abstract
With the rise of large-scale
pre-trained language models
,
open-domain question-answering
(ODQA) has become an important research topic in NLP. Based on the popular pre-training fine-tuning approach, we posit that
→