BriefGPT.xyz
Jul, 2023
Okapi: 指令调整的多语言大型语言模型及基于人类反馈的强化学习
Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
HTML
PDF
Viet Dac Lai, Chien Van Nguyen, Nghia Trung Ngo, Thuat Nguyen, Franck Dernoncourt...
TL;DR
Okapi是第一个基于RLHF进行多语言指导调整的系统,引入26种不同语言的指导和回应排序数据,以促进未来多语言LLM研究的实验和发展。
Abstract
A key technology for the development of
large language models
(LLMs) involves
instruction tuning
that helps align the models' responses with human expectations to realize impressive learning abilities. Two major
→