This paper introduces a novel approach to learn visually grounded meaning
representations of words as low-dimensional node embeddings on an underlying
graph hierarchy. The lower level of the hierarchy models modality-specific word
representations through dedicated but communicating graphs, while the higher
level puts these representations together on a singl