Identifying how much a model ${\widehat{p}}_{\theta}(Y|X)$ knows about the
stochastic real-world process $p(Y|X)$ it was trained on is important to ensure
it avoids producing incorrect or "hallucinated" answers or taking unsafe
actions. But this is difficult for generative models because probabilistic
predictions do not distinguish between per-response noise (aleatoric
uncertainty) and lack of knowledge about the process (epistemic uncertainty),
and existing epistemic uncertainty quantification techniques tend to be
overconfident when the model underfits. We propose a general strategy for
teaching a model to both approximate $p(Y|X)$ and also estimate the remaining
gaps between ${\widehat{p}}_{\theta}(Y|X)$ and $p(Y|X)$: train it to predict
pairs of independent responses drawn from the true conditional distribution,
allow it to "cheat" by observing one response while predicting the other, then
measure how much it cheats. Remarkably, we prove that being good at cheating
(i.e. cheating whenever it improves your prediction) is equivalent to being
second-order calibrated, a principled extension of ordinary calibration that
allows us to construct provably-correct frequentist confidence intervals for
$p(Y|X)$ and detect incorrect responses with high probability. We demonstrate
empirically that our approach accurately estimates how much models don't know
across ambiguous image classification, (synthetic) language modeling, and
partially-observable navigation tasks, outperforming existing techniques.

通过训练模型来预测真实条件分布，并估计模型与真实条件分布之间的差异，并通过欺骗策略和二阶校准检测错误响应，准确估计模型对模糊图像分类、语言建模和部分可观测导航任务中的不确定性。

专家不作弊：通过预测对来学习未知知识

Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

We present pyRDDLGym, a Python framework for auto-generation of OpenAI Gym
environments from RDDL declerative description. The discrete time step
evolution of variables in RDDL is described by conditional probability
functions, which fits naturally into the Gym step scheme. Furthermore, since
RDDL is a lifted description, the modification and scaling up of environments
to support multiple entities and different configurations becomes trivial
rather than a tedious process prone to errors. We hope that pyRDDLGym will
serve as a new wind in the reinforcement learning community by enabling easy
and rapid development of benchmarks due to the unique expressive power of RDDL.
By providing explicit access to the model in the RDDL description, pyRDDLGym
can also facilitate research on hybrid approaches for learning from interaction
while leveraging model knowledge. We present the design and built-in examples
of pyRDDLGym, and the additions made to the RDDL language that were
incorporated into the framework.

pyRDDLGym 是一个 Python 框架，可以通过 RDDL 描述自动生成 OpenAI Gym 的环境，支持模型知识以及多个实体和不同配置。它可以帮助强化学习领域快速开发新的基准，且便于基于交互式学习实现混合方法的研究。