BriefGPT.xyz
Sep, 2023
NusaWrites:为代表性和极度资源匮乏的语言构建高质量语料库
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages
HTML
PDF
Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave...
TL;DR
对印尼本土语言进行案例研究的结果表明,原生说话者通过段落撰写所生成的数据集在词汇多样性和文化内容方面优质,有助于推广自然语言处理技术到较少研究的语言领域。
Abstract
Democratizing access to
natural language processing
(NLP) technology is crucial, especially for underrepresented and extremely
low-resource languages
. Previous research has focused on developing labeled and unlab
→