In this article, we describe how you can perform end-to-end fact-checking in
over 100 languages using Factiverse AI models. We also show through an
experimental benchmark that fine-tuned models tailored for fact-checking tasks
outperform Large Language Models such as GPT-4, GPT-3.5-Turbo, and Mistral-7b.

通过 Factiverse AI 模型，在超过 100 种语言中进行端到端事实核查，并通过实验基准证明，针对事实核查任务进行细化调整的模型优于大型语言模型，如 GPT-4，GPT-3.5-Turbo 和 Mistral-7b。

大规模的端到端多语种事实核查

End-to-end multilingual fact-checking at scale

We offer an experimental benchmark and empirical study for off-policy policy
evaluation (OPE) in reinforcement learning, which is a key problem in many
safety critical applications. Given the increasing interest in deploying
learning-based methods, there has been a flurry of recent proposals for OPE
method, leading to a need for standardized empirical analyses. Our work takes a
strong focus on diversity of experimental design to enable stress testing of
OPE methods. We provide a comprehensive benchmarking suite to study the
interplay of different attributes on method performance. We distill the results
into a summarized set of guidelines for OPE in practice. Our software package,
the Caltech OPE Benchmarking Suite (COBS), is open-sourced and we invite
interested researchers to further contribute to the benchmark.

通过实验基准和实证研究，我们提供了针对强化学习中的离线策略评估（OPE）的实验基准和实证研究，重点研究了实验设计的多样性以启用 OPE 方法的应力测试。我们提供了一个完整的基准套件，以研究不同属性对方法性能的相互作用，并将结果总结为实践指南。我们提供的 Caltech OPE 基准测试套件（COBS）是开源的，并邀请感兴趣的研究人员进一步贡献。