We are concerned with the problem of hyperparameter selection for the fitted
Q-evaluation (FQE). FQE is one of the state-of-the-art method for offline
policy evaluation (OPE), which is essential to the reinforcement learning
without environment simulators. However, like other OPE metho