In many machine learning systems that jointly learn from multiple modalities,
a core research question is to understand the nature of multimodal
interactions: the emergence of new task-relevant information during learning
from both modalities that was not present in either alone. We st