deep reinforcement learning (DRL) has achieved significant breakthroughs in
various tasks. However, most DRL algorithms suffer a problem of generalizing
the learned policy which makes the learning performance largely affected even
by minor modifications of the training environment. Exc