2017-06 | Transformers | Google | Attention Is All You Need |
2018-06 | GPT 1.0 | OpenAI | Improving Language Understanding by Generative Pre-Training |
2018-10 | BERT | Google | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
2019-02 | GPT 2.0 | OpenAI | Language Models are Unsupervised Multitask Learners |
2019-09 | Megatron-LM | NVIDIA | Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
2019-10 | T5 | Google | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
2019-10 | ZeRO | Microsoft | ZeRO: Memory Optimizations Toward Training Trillion Parameter Models |
2020-01 | Scaling Law | OpenAI | Scaling Laws for Neural Language Models |
2020-05 | GPT 3.0 | OpenAI | Language models are few-shot learners |
2021-01 | Switch Transformers | Google | Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity |
2021-08 | Codex | OpenAI | Evaluating Large Language Models Trained on Code |
2021-08 | Foundation Models | Stanford | On the Opportunities and Risks of Foundation Models |
2021-09 | FLAN | Google | Finetuned Language Models are Zero-Shot Learners |
2021-10 | T0 | HuggingFace et al. | Multitask Prompted Training Enables Zero-Shot Task Generalization |
2021-12 | GLaM | Google | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
2021-12 | WebGPT | OpenAI | WebGPT: Browser-assisted question-answering with human feedback |
2021-12 | Retro | DeepMind | Improving language models by retrieving from trillions of tokens |
2021-12 | Gopher | DeepMind | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
2022-01 | COT | Google | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models |
2022-01 | LaMDA | Google | LaMDA: Language Models for Dialog Applications |
2022-01 | Minerva | Google | Solving Quantitative Reasoning Problems with Language Models |
2022-01 | Megatron-Turing NLG | Microsoft&NVIDIA | Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
2022-03 | InstructGPT | OpenAI | Training language models to follow instructions with human feedback |
2022-04 | PaLM | Google | PaLM: Scaling Language Modeling with Pathways |
2022-04 | Chinchilla | DeepMind | An empirical analysis of compute-optimal large language model training |
2022-05 | OPT | Meta | OPT: Open Pre-trained Transformer Language Models |
2022-05 | UL2 | Google | Unifying Language Learning Paradigms |
2022-06 | Emergent Abilities | Google | Emergent Abilities of Large Language Models |
2022-06 | BIG-bench | Google | Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models |
2022-06 | METALM | Microsoft | Language Models are General-Purpose Interfaces |
2022-09 | Sparrow | DeepMind | Improving alignment of dialogue agents via targeted human judgements |
2022-10 | Flan-T5/PaLM | Google | Scaling Instruction-Finetuned Language Models |
2022-10 | GLM-130B | Tsinghua | GLM-130B: An Open Bilingual Pre-trained Model |
2022-11 | HELM | Stanford | Holistic Evaluation of Language Models |
2022-11 | BLOOM | BigScience | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
2022-11 | Galactica | Meta | Galactica: A Large Language Model for Science |
2022-12 | OPT-IML | Meta | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
2023-01 | Flan 2022 Collection | Google | The Flan Collection: Designing Data and Methods for Effective Instruction Tuning |
2023-02 | LLaMA | Meta | LLaMA: Open and Efficient Foundation Language Models |
2023-02 | Kosmos-1 | Microsoft | Language Is Not All You Need: Aligning Perception with Language Models |
2023-03 | LRU | DeepMind | Resurrecting Recurrent Neural Networks for Long Sequences |
2023-03 | PaLM-E | Google | PaLM-E: An Embodied Multimodal Language Model |
2023-03 | GPT 4 | OpenAI | GPT-4 Technical Report |
2023-04 | LLaVA | UW–Madison&Microsoft | Visual Instruction Tuning |
2023-04 | Pythia | EleutherAI et al. | Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling |
2023-05 | Dromedary | CMU et al. | Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision |
2023-05 | PaLM 2 | Google | PaLM 2 Technical Report |
2023-05 | RWKV | Bo Peng | RWKV: Reinventing RNNs for the Transformer Era |
2023-05 | DPO | Stanford | Direct Preference Optimization: Your Language Model is Secretly a Reward Model |
2023-05 | ToT | Google&Princeton | Tree of Thoughts: Deliberate Problem Solving with Large Language Models |
2023-07 | LLaMA2 | Meta | Llama 2: Open Foundation and Fine-Tuned Chat Models |
2023-10 | Mistral 7B | Mistral | Mistral 7B |
2023-12 | Mamba | CMU&Princeton | Mamba: Linear-Time Sequence Modeling with Selective State Spaces |
2024-01 | DeepSeek-v2 | DeepSeek | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model |
2024-02 | OLMo | Ai2 | OLMo: Accelerating the Science of Language Models |
2024-05 | Mamba2 | CMU&Princeton | Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality |
2024-05 | Llama3 | Meta | The Llama 3 Herd of Models |
2024-06 | FineWeb | HuggingFace | The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale |
2024-09 | OLMoE | Ai2 | OLMoE: Open Mixture-of-Experts Language Models |
2024-12 | Qwen2.5 | Alibaba | Qwen2.5 Technical Report |
2024-12 | DeepSeek-V3 | DeepSeek | DeepSeek-V3 Technical Report |
2025-01 | DeepSeek-R1 | DeepSeek | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning |