A promising characteristic of deep reinforcement learning (DRL) is its
capability to learn optimal policy in an end-to-end manner without relying on
feature engineering. However, most approaches assume a fully observable state
space, i.e. fully observable Markov Decision Processes (MDP