The standard approach for Partially Observable Markov Decision Processes (pomdps) is to convert them to a fully observed belief-state MDP. However, the belief state depends on the system model and is therefore not viable in reinforcement learning (RL) settings. A widely used alternativ