The visual explanation of learned representation of models helps to understand the fundamentals of learning. The attentional models of previous works used to visualize the attended regions over an image or text using their learned weights to confirm their intended mechanism. Kim et al. (2016) show that the Hadamard product in multimodal deep networks, which is well-known for the joint function of visual question answering tasks, implicitly performs an attentional mechanism for visual inputs. In this work, we extend their work to show that the Hadamard product in multimodal deep networks performs not only for visual inputs but also for textual inputs simultaneously using the proposed gradient-based visualization technique. The attentional effect of Hadamard product is visualized for both visual and textual inputs by analyzing the two inputs and an output of the Hadamard product with the proposed method and compared with learned attentional weights of a visual question answering model.

本文拓展了Kim et al. (2016)的工作，提出了一种基于梯度的可视化技术，证明了多模式深度网络中的Hadamard乘积不仅适用于视觉输入，同时适用于文本输入，并可通过该技术可视化Hadamard乘积对视觉和文本输入的注意力机制。

多模深度网络中 Hadamard 乘积的可视化解释