Existing visual question answering methods tend to capture the cross-modal
spurious correlations, and fail to discover the true causal mechanism that
facilitates reasoning truthfully based on the dominant visual evidence and the
question intention. Additionally, the existing methods us
CausalVLR is an open-source toolbox based on PyTorch containing a diverse set of causal inference methods for various visual-linguistic reasoning tasks, with available code and models for training and inference.