Large language models (LMs) are pretrained on diverse data sources: news, discussion forums, books, online encyclopedias. A significant portion of this data includes facts and opinions which, on one hand, celebrate democracy and diversity of ideas, and on the other hand are inherently socially biased. Our work develops new methods to (1) measure media biases in LMs trained on such corpora, along the social and economic axes, and (2) measure the fairness of downstream NLP models trained on top of politically biased LMs. We focus on hate speech and misinformation detection, aiming to empirically quantify the effects of political (social, economic) biases in pretraining data on the fairness of high-stakes social-oriented tasks. Our findings reveal that pretrained LMs do have political leanings which reinforce the polarization present in pretraining corpora, propagating social biases into hate speech predictions and media biases into misinformation detectors. We discuss the implications of our findings for NLP research and propose future directions to mitigate the unfairness.

本研究旨在测量大型语言模型中社会和经济偏见的媒体偏见，以及在预训练数据中表现出政治（社会，经济）偏见的先验模型对高风险社会导向任务的公平性的影响。结果发现先验模型确实存在政治倾向，这可能加剧原始数据中的偏见并将其传播到误导检测器之类的下游模型中，本研究讨论了这些发现对NLP研究的影响，并提出了减轻不公平的未来方向。

从预训练数据到语言模型到下游任务：跟踪导致不公正NLP模型的政治偏见