multimodal semantic understanding often has to deal with uncertainty, which
means the obtained messages tend to refer to multiple targets. Such uncertainty
is problematic for our interpretation, including inter- and intra-modal
uncertainty. Little effort has studied the modeling of thi