reinforcement learning (RL) methods learn optimal decisions in the presence
of a stationary environment. However, the stationary assumption on the
environment is very restrictive. In many real world problems like traffic
signal control, robotic applications, one often encounters situat