BriefGPT.xyz
Sep, 2023
探索训练数据分布和子词标记对机器翻译中的性别偏见的影响
Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation
HTML
PDF
Bar Iluz, Tomasz Limisiewicz, Gabriel Stanovsky, David Mareček
TL;DR
我们研究了标记化对机器翻译中的性别偏见的影响,着重关注训练数据中性别化职业名称频率、它们在次词标记器词汇表中的表示以及性别偏见之间的相互作用。
Abstract
We study the effect of
tokenization
on
gender bias
in
machine translation
, an aspect that has been largely overlooked in previous works. S
→