Significant advances have recently been achieved in multi-agent reinforcement learning (MARL) which tackles sequential decision-making problems involving multiple participants. However, MARL requires a tremendous number of samples for effective training. On the other hand,