AbstractIn the field of
reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no methods exist for determining high-confidence safety bounds for a given evaluation policy in the inverse
→