To measure bias, we encourage teams to consider using AUC Gap: the absolute difference between the highest and lowest test AUC for subgroups (e.g., gender, race, SES, prior knowledge). It is agnostic to the AI/ML algorithm used and it captures the disparity in model performance for any number of subgroups, which enables non-binary fairness assessments such as for intersectional identity groups. The LEVI teams use a wide range of AI/ML models in pursuit of a common goal of doubling math achievement in low-income middle schools. Ensuring that the models, which are trained on datasets collected in many different contexts, do not introduce or amplify biases is important for achieving the LEVI goal. We offer here a versatile and easy-to-compute measure of model bias for all LEVI teams in order to create a common benchmark and an analytical basis for sharing what strategies have worked for different teams.

衡量偏见，我们鼓励团队使用AUC Gap：子组的最高和最低测试AUC之间的绝对差异（例如，性别，种族，SES，先前知识）。它对所使用的AI/ML算法不加偏好，并捕捉了模型在任意数量的子组中的性能差异，从而实现了关于交叉身份群体的公正评估。LEVI团队利用各种AI/ML模型，追求在低收入中学中将数学成就增加一倍的共同目标。确保这些模型在训练集在许多不同环境中收集所得的情况下不引入或放大偏见对于实现LEVI目标至关重要。为了为所有LEVI团队创建一个共同的基准和分析依据，我们在这里提供了一种通用且易于计算的模型偏见度量，以及分享哪些策略适用于不同团队的分析基础。

公平中心技术简报：AUC差距