BriefGPT.xyz
Jul, 2023
MorphPiece:远离统计语言表示
MorphPiece : Moving away from Statistical Language Representation
HTML
PDF
Haris Jabbar
TL;DR
本研究提出了一种基于形态分词的语言学分词方案MorphPiece,并使用此方法训练了一个基于GPT的语言模型MorphGPT。相对于标准的BPE分词器,MorphGPT具有更好的性能表现,包括在超大语言模型性能及NLP任务上表现出更高水平。
Abstract
tokenization
is a critical part of modern
nlp pipelines
. However, contemporary tokenizers for Large Language Models are based on statistical analysis of text corpora, without much consideration to the
→