TL;DR通过从易到难的泛化和评估者的使用,本文提出一种可扩展的 AI 对齐方法,用于解决超越人类监督水平的困难推理任务,提升生成器模型在数学问题上的准确率。
Abstract
Current ai alignment methodologies rely on human-provided demonstrations or
judgments, and the learned capabilities of AI systems would be upper-bounded by
human capabilities as a result. This raises a challenging research question:
How can we keep improving the systems when their capa