大语言模型发展历程 | BriefGPT - AI 论文速递

大语言模型发展历程

最后更新 2023-06-19

baichuan-7B

发布时间2023-06-15
模型参数70 亿
公司/机构百川智能
GitHubhttps://github.com/baichuan-inc/baichuan-7B

Aquila-7B

发布时间2023-06-10
模型参数70 亿
公司/机构BAAI
GitHubhttps://github.com/FlagAI-Open/FlagAI/tree/master/examples/Aquila

Falcon

发布时间2023-05-24
模型参数400 亿
公司/机构Technology Innovation Institute
Hugging Facehttps://huggingface.co/tiiuae/falcon-40b

Guanaco

发布时间2023-05-23
模型参数70 亿 ~ 650 亿
公司/机构University of Washington
论文QLORA: Efficient Finetuning of Quantized LLMs

RWKV

发布时间2023-05-22
模型参数70 亿
公司/机构RWKV Foundation
论文RWKV: Reinventing RNNs for the Transformer Era

CodeT5+

发布时间2023-05-13
模型参数160 亿
公司/机构Salesforce
论文CodeT5+: Open Code Large Language Models for Code Understanding and Generation

PaLM2

发布时间2023-05-10
模型参数10 亿 ~ 100 亿
公司/机构Google
论文PaLM 2 Technical Report

RedPajama INCITE

发布时间2023-05-05
模型参数28 亿
公司/机构TOGETHER
论文Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models
Hugging Facehttps://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-3B-v1

MPT

发布时间2023-05-05
模型参数70 亿
公司/机构MosaicML
论文Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

StarCoder

发布时间2023-05-05
模型参数70 亿
公司/机构Hugging Face
论文Star Coder: May the Source be With You!
GitHubhttps://github.com/bigcode-project/starcoder/

OpenLLaMa

发布时间2023-05-03
模型参数70 亿
公司/机构Berkeley Artificial Intelligence Research
论文OpenLLaMA: An Open Reproduction of LLaMA
GitHubhttps://github.com/openlm-research/open_llama

StableLM

发布时间2023-04-20
模型参数30 亿 & 70 亿
公司/机构Stability AI
论文Stability AI Launches the First of its StableLM Suite of Language Models

Koala

发布时间2023-04-03
模型参数130 亿
公司/机构Berkeley Artificial Intelligence Research
论文Koala: A Dialogue Model for Academic Research
GitHubhttps://github.com/young-geng/EasyLM

Vicuna-13B

发布时间2023-03-31
模型参数130 亿
公司/机构LM-SYS
论文Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality
GitHubhttps://github.com/lm-sys/FastChat

BloombergGPT

发布时间2023-03-30
模型参数500 亿
公司/机构Bloomberg
论文BloombergGPT: A Large Language Model for Finance

GPT4All

发布时间2023-03-29
模型参数70 亿
公司/机构Nomic AI
论文GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo
GitHubhttps://github.com/nomic-ai/gpt4all

Dolly

发布时间2023-03-24
模型参数60 亿
公司/机构Databricks
论文Hello Dolly: Democratizing the magic of ChatGPT with open models
Hugging Facehttps://huggingface.co/databricks/dolly-v1-6b

ChatGLM-6B

发布时间2023-03-14
模型参数62 亿
公司/机构清华大学
论文ChatGLM-6B: An Open Bilingual Dialogue Language Model
GitHubhttps://github.com/THUDM/ChatGLM-6B

GPT-4

发布时间2023-03-14
模型参数未知
公司/机构OpenAI
论文GPT-4 Technical Report

Stanford Alpaca

发布时间2023-03-13
模型参数70 亿
公司/机构Stanford
论文Alpaca: A Strong, Replicable Instruction-Following Model
GitHubhttps://github.com/tatsu-lab/stanford_alpaca

LLaMA

发布时间2023-02-24
模型参数70 亿～ 650 亿
公司/机构Meta
论文LLaMA: Open and Efficient Foundation Language Models
GitHubhttps://github.com/facebookresearch/llama

GPT-3.5

发布时间2022-11-30
模型参数1750 亿
公司/机构OpenAI
论文GPT-3.5 Model

BLOOM

发布时间2022-11-09
模型参数1760 亿
公司/机构BigScience
论文BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Hugging Facehttps://huggingface.co/bigscience/bloom

BLOOMZ

发布时间2022-11-03
模型参数1760 亿
公司/机构BigScience
论文Crosslingual Generalization through Multitask Finetuning

mT0

发布时间2022-11-03
模型参数130 亿
公司/机构BigScience
论文Crosslingual Generalization through Multitask Finetuning

Flan-U-PaLM

发布时间2022-10-20
模型参数5400 亿
公司/机构Google
论文Scaling Instruction-Finetuned Language Models

Flan-T5

发布时间2022-10-20
模型参数110 亿
公司/机构Google
论文Scaling Instruction-Finetuned Language Models
GitHubhttps://github.com/google-research/t5x/blob/main/docs/models.md

WeLM

发布时间2022-09-21
模型参数100 亿
公司/机构微信
论文WeLM: A Well-Read Pre-trained Language Model for Chinese

PLUG

发布时间2022-09-01
模型参数270 亿
公司/机构阿里达摩院
论文PLUG: Pre-training for Language Understanding and Generation
GitHubhttps://github.com/alibaba/AliceMind/tree/main/PLUG

OPT

发布时间2022-05-02
模型参数1750 亿
公司/机构Meta
论文OPT: Open Pre-trained Transformer Language Models
GitHubhttps://github.com/facebookresearch/metaseq/tree/main/projects/OPT

PaLM

发布时间2022-04-05
模型参数5400 亿
公司/机构Google
论文PaLM: Scaling Language Modeling with Pathways
GitHubhttps://github.com/lucidrains/PaLM-pytorch

Chinchilla

发布时间2022-03-29
模型参数700 亿
公司/机构Google DeepMind
论文Training Compute-Optimal Large Language Models

CodeGen

发布时间2022-03-25
模型参数160 亿
公司/机构Salesforce
论文CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
GitHubhttps://github.com/salesforce/codegen

GLM-130B

发布时间2022-03-17
模型参数1300 亿
公司/机构清华大学
论文GLM: General Language Model Pretraining with Autoregressive Blank Infilling
GitHubhttps://github.com/THUDM/GLM-130B

Instruct GPT

发布时间2022-03-04
模型参数1750 亿
公司/机构OpenAI
论文Training Language Models to Follow Instructions with Human Feedback
GitHubhttps://github.com/openai/following-instructions-human-feedback

AlphaCode

发布时间2022-02-08
模型参数410 亿
公司/机构Google DeepMind
论文Competition-Level Code Generation with AlphaCode

MT-NLG

发布时间2022-01-28
模型参数5300 亿
公司/机构Microsoft
论文Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

LaMDA

发布时间2022-01-20
模型参数1370 亿
公司/机构Google
论文LaMDA: Language Models for Dialog Applications

WebGPT

发布时间2021-12-17
模型参数1750 亿
公司/机构OpenAI
论文WebGPT: Browser-assisted question-answering with human feedback

GLaM

发布时间2021-12-13
模型参数12000 亿
公司/机构Google
论文GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Gopher

发布时间2021-12-08
模型参数2800 亿
公司/机构Google DeepMind
论文Scaling Language Models: Methods, Analysis & Insights from Training Gopher

T0

发布时间2021-10-15
模型参数110 亿
公司/机构Hugging Face
论文Multitask Prompted Training Enables Zero-Shot Task Generalization

FLAN

发布时间2021-09-03
模型参数1370 亿
公司/机构Google
论文Finetuned Language Models Are Zero-Shot Learners

Codex

发布时间2021-07-07
模型参数120 亿
公司/机构OpenAI
论文Evaluating large language models trained on code

ERNIE 3.0

发布时间2021-07-05
模型参数100 亿
公司/机构百度
论文ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
GitHubhttps://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0

PanGu-Alpha

发布时间2021-04-26
模型参数2000 亿
公司/机构华为
论文PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
GitHubhttps://openi.pcl.ac.cn/PCL-Platform.Intelligence/PanGu-Alpha

Switch Transformer

发布时间2021-01-11
模型参数16000 亿
公司/机构Google
论文Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

mT5

发布时间2020-10-22
模型参数130 亿
公司/机构Google
论文mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

GShard

发布时间2020-06-30
模型参数6000 亿
公司/机构Google
论文GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

GPT-3

发布时间2020-05-28
模型参数1750 亿
公司/机构OpenAI
论文Language Models are Few-Shot Learners

Turing-NLG

发布时间2020-02-13
模型参数170 亿
公司/机构Microsoft
论文Turing-NLG: A 17-billion-parameter language model by Microsoft

T5

发布时间2019-10-23
模型参数110 亿
公司/机构Google
论文Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GitHubhttps://github.com/google-research/t5x

XLNet

发布时间2019-06-19
模型参数3.4 亿
公司/机构Google Brain
论文XLNet: Generalized Autoregressive Pretraining for Language Understanding
GitHubhttps://github.com/zihangdai/xlnet

Baidu-ERNIE

发布时间2019-04-19
模型参数3.4 亿
公司/机构百度
论文ERNIE: Enhanced Representation through Knowledge Integration
GitHubhttps://github.com/PaddlePaddle/ERNIE

GPT-2

发布时间2019-02-14
模型参数15 亿
公司/机构OpenAI
论文Language Models are Unsupervised Multitask Learners
GitHubhttps://github.com/openai/gpt-2

BERT

发布时间2018-10-11
模型参数3.4 亿
公司/机构Google
论文Bidirectional Encoder Representations from Transformers
GitHubhttps://github.com/google-research/bert

GPT-1

发布时间2018-06-11
模型参数1.17 亿
公司/机构OpenAI
论文Improving Language Understanding by Generative Pre-Training
GitHubhttps://github.com/openai/finetune-transformer-lm