This research conducts a comparative study on multilingual text classification methods, utilizing deep learning and embedding visualization. The study employs LangDetect, LangId, FastText, and Sentence Transformer on a dataset encompassing 17 languages. It explores dimensionality's impact on clustering, revealing FastText's clearer clustering in 2D visualization due to its extensive multilingual corpus training. Notably, the FastText multi-layer perceptron model achieved remarkable accuracy, precision, recall, and F1 score, outperforming the Sentence Transformer model. The study underscores the effectiveness of these techniques in multilingual text classification, emphasizing the importance of large multilingual corpora for training embeddings. It lays the groundwork for future research and assists practitioners in developing language detection and classification systems. Additionally, it includes the comparison of multi-layer perceptron, LSTM, and Convolution models for classification.

该研究通过深度学习和嵌入可视化对多语言文本分类方法进行比较研究，特别关注FastText和Sentence Transformer模型，并探索了维度对聚类的影响。研究结果显示，FastText在二维可视化中显示出更清晰的聚类效果，取得了显著的准确性、精确率、召回率和F1分数，优于Sentence Transformer模型。该研究强调了这些技术在多语言文本分类中的有效性，并强调了使用大型多语言语料库进行嵌入训练的重要性。它为未来的研究奠定了基础，并辅助开发语言检测和分类系统。此外，研究还对多层感知机、LSTM和卷积模型进行了比较。

跨语言文本分类与识别的深度学习和嵌入可视化的比较分析