BriefGPT.xyz
Oct, 2023
Tuna: 基于大型语言模型的指令调整
Tuna: Instruction Tuning using Feedback from Large Language Models
HTML
PDF
Haoran Li, Yiran Liu, Xingxing Zhang, Wei Lu, Furu Wei
TL;DR
通过概率排名和上下文排名的方法对已调整过的模型进行微调,最终生成更好回复的模型称为Tuna,能够优于强化学习基线模型,提高对各种任务的性能。
Abstract
instruction tuning
of open-source
large language models
(LLMs) like LLaMA, using direct outputs from more powerful LLMs such as Instruct-GPT and GPT-4, has proven to be a cost-effective way to align model behavio
→