BriefGPT.xyz
Jun, 2018
使用停用词和变音符号进行罗曼语言的自动语言识别
Automatic Language Identification for Romance Languages using Stop Words and Diacritics
HTML
PDF
Ciprian-Octavian Truică, Julien Velcin, Alexandru Boicea
TL;DR
本研究提出了一种基于停用词和变音符词典的统计方法,用于自动识别文本语言,主要关注罗曼语系。实验证明该方法的准确率在小文本上超过90%,而在大文本上超过99.8%。
Abstract
automatic language identification
is a natural language processing problem that tries to determine the natural language of a given content. In this paper we present a
statistical method
for
→