Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of pixel-level masks, which involves a large amount of human labour and time for annotation. In contrast, image-level labels are much easier to obtain. In this work, we propose a novel method