Detecting subjectivity in news sentences is crucial for identifying media bias, enhancing credibility, and combating misinformation by flagging opinion-based content. It provides insights into public sentiment, empowers readers to make informed decisions, and encourages critical thinking. While research has developed methods and systems for this purpose, most efforts have focused on English and other high-resourced languages. In this study, we present the first large dataset for subjectivity detection in Arabic, consisting of ~3.6K manually annotated sentences, and GPT-4o based explanation. In addition, we included instructions (both in English and Arabic) to facilitate LLM based fine-tuning. We provide an in-depth analysis of the dataset, annotation process, and extensive benchmark results, including PLMs and LLMs. Our analysis of the annotation process highlights that annotators were strongly influenced by their political, cultural, and religious backgrounds, especially at the beginning of the annotation process. The experimental results suggest that LLMs with in-context learning provide better performance. We aim to release the dataset and resources for the community.

本研究介绍了第一个用于阿拉伯语主观性检测的大型数据集，包括约3.6K个手动注释的句子，并基于GPT-4o提供解释。我们还提供了英语和阿拉伯语的说明以便进行基于LLM的微调，并进行了数据集、注释过程以及广泛基准测试结果的深入分析，包括预训练语言模型(PLMs)和LLMs。我们的分析表明，注释者在注释过程的开始阶段受到其政治、文化和宗教背景的强烈影响。实验结果表明，具有上下文学习的LLMs表现更好。我们旨在向社区发布该数据集和资源。

ThatiAR：阿拉伯新闻句子主观性检测