machine learning advances in the last decade have relied significantly on
large-scale datasets that continue to grow in size. Increasingly, those
datasets also contain different data modalities. However, large multi-modal
datasets are hard to annotate, and annotations may contain biase