AbstractInfinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical
importance sampling (IS) estimators. Recently, Liu et al. (2018a) proposed an approach that significantly reduces the variance of infinite-horizon
→