TL;DR本文提出了一种方法,使用未标注的数据进行自我训练和推理提高,通过 fine-tuning 在多个任务上达到了 SOTA 水平。
Abstract
large language models (LLMs) have achieved excellent performances in various
tasks. However, fine-tuning an LLM requires extensive supervision. Human, on
the other hand, may improve their reasoning abilities by s