multimodal fusion integrates the complementary information present in
multiple modalities and has gained much attention recently. Most existing
fusion approaches either learn a fixed fusion strategy during training and
inference, or are only capable of fusing the information to a certa