BriefGPT.xyz
Oct, 2022
语音中的词边界挖掘作为自然标注的词分割数据
Mining Word Boundaries in Speech as Naturally Annotated Word Segmentation Data
HTML
PDF
Lei Zhang, Shilin Zhou, Chen Gong, Zhenghua Li, Zhefeng Wang...
TL;DR
本研究提出了一种在跨领域和低资源情况下提高中文分词性能的方法,即从语音中的停顿中挖掘自然标注数据来训练CWS模型,并证明该方法能够显著提高CWS的性能。
Abstract
chinese word segmentation
(CWS) models have achieved very high performance when the training data is sufficient and in-domain. However, the performance drops drastically when shifting to cross-domain and
low-resource sc
→