This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.

本文描述了我们开发的社交媒体评论中恐同/恐跨基不算法的系统，该系统是LT-EDI-2024共享任务的一部分。我们采用了基于transformer的方法，为十种语言条件（英语、西班牙语、古吉拉特语、印地语、卡纳达语、马拉雅拉姆语、马拉地语、泰米尔语、土鲁语和泰卢固语）开发了多类别分类模型。我们在领域适应期间引入了合成和有机脚本切换语言数据的实例，以反映社交媒体语言中标记训练数据中所看到的语言现实。我们系统在古吉拉特语和泰卢固语中排名第二，其他语言条件的表现有所不同。结果表明，加入脚本切换等凌语言行为元素可以提高语言检测系统的性能，尤其是在资源匮乏的语言条件下。

LT-EDI-2024 平台上鉴别少资源语言中反对LGBTQ+仇恨语言的自动化检测