Despite the recent progress in deep learning, most approaches still go for a
silo-like solution, focusing on learning each task in isolation: training a
separate neural network for each individual task. Many real-world problems,
however, call for a multi-modal approach and, therefore,