InterFair: 具有自然语言反馈的去偏见公平可解释预测

Oct, 2022

InterFair: 具有自然语言反馈的去偏见公平可解释预测

InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Bodhisattwa Prasad Majumder, Zexue He, Julian McAuley

TL;DR该研究论文提出，自然语言处理模型中的去偏置方法应当使用敏感信息来实现公平去偏置，而不是盲目地消除它，为了实现公平平衡，研究人员建议采用能够与用户互动并提供反馈的交互式方法，从而在任务表现和偏置缓解之间实现更好和公正的平衡，并支持详尽的解释。

Abstract

debiasing methods in nlp models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We instead argue that a favorable debiasing method should use →