BriefGPT.xyz
Aug, 2020
Uralic语言鉴别(ULI) 2020共享任务数据集和Wanca 2017语料库
Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus
HTML
PDF
Tommi Jauhiainen, Heidi Jauhiainen, Niko Partanen, Krister Lindén
TL;DR
该论文介绍了Wanca 2017语料库及其在Uralic语言鉴定上的应用、以及基于ULI 2020数据集的基线语言识别实验。
Abstract
This article introduces the
wanca 2017 corpus
of texts crawled from the internet from which the sentences in rare
uralic languages
for the use of the
→