BriefGPT.xyz
Mar, 2023
带检索的分类方法及其解耦表示
Retrieval-Augmented Classification with Decoupled Representation
HTML
PDF
Xinnian Liang, Shuangzhi Wu, Hui Huang, Jiaqi Bai, Chao Bian...
TL;DR
本文提出了一种混合粒度的中文BERT模型(MigBERT),通过同时考虑字符和词,设计了学习字符和单词级表示的目标函数,在各种中文NLP任务上获得了新的SOTA性能,实验结果表明,单词语义比字符更丰富,而MigBERT也适用于日语。
Abstract
pretrained language models
(PLMs) have shown marvelous improvements across various
nlp tasks
. Most Chinese PLMs simply treat an input text as a sequence of characters, and completely ignore word information. Alth
→