This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. The results suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.

该研究使用BERT-based语言模型开发了多类别分类系统，用于检测社交媒体评论中的恐同和恐Trans内容，跨五种语言条件：英语，西班牙语，印地语，马拉雅拉姆语和泰米尔语。发现使用时空相关的社交媒体语言数据可以提高语言分类系统的性能。

使用时空再训练的语言模型检测社交媒体评论中的恐同/恐跨性别现象