BriefGPT.xyz
May, 2021
联合优化标记化和下游模型
Joint Optimization of Tokenization and Downstream Model
HTML
PDF
Tatsuya Hiraoka, Sho Takase, Kei Uchiumi, Atsushi Keyaki, Naoaki Okazaki
TL;DR
本文提出了一种优化分词器和模型以找到适当分词的新方法,该方法可用于各种NLP任务,包括后处理和多种语言翻译。实验结果表明,该方法通过确定适当的分词方法可以提高性能。
Abstract
Since traditional tokenizers are isolated from a downstream task and model, they cannot output an appropriate
tokenization
depending on the task and model, although recent studies imply that the appropriate
tokenization
→