List of large language models 📖 Wikipedia

A large language model (LLM) is a type of machine learning model designed for natural language processing tasks such as language generation. LLMs are language models with many parameters, and are trained with self-supervised learning on a vast amount of text.

List

edit

For the training cost column, 1 petaFLOP-day equals 1 petaFLOP/sec × 1 day, or 8.64×10¹⁹ FLOP (floating point operations). Only the cost of the largest model is shown. The number of parameters is measured in billions,^[a] and the training cost is measured in petaFLOP-days.

2018

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
GPT-1	Jun 11	OpenAI	0.117B	Unknown	1^[1]	MIT^[2]	First GPT model, decoder-only transformer. Trained for 30 days on 8 P600 GPUs.^[3]
BERT	Oct 2018	Google	0.340B^[4]	3.3B words^[4]	9^[5]	Apache 2.0^[6]	An early and influential language model.^[7] Encoder-only and thus not built to be prompted or generative.^[8] Training took 4 days on 64 TPUv2 chips.^[4]

2019

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
T5	Oct 2019	Google	11B^[9]	34B tokens^[9]	Unknown	Apache 2.0^[10]	Base model for Google projects like Imagen.^[11]
XLNet	Jun 2019	Google	0.340B^[12]	33B words	330	Apache 2.0^[13]	An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.^[14]
GPT-2	Feb 2019	OpenAI	1.5B^[15]	40GB^[16] (~10B tokens)^[17]	28^[18]	MIT^[19]	Trained on 32 TPUv3 chips for 1 week.^[18]

2020

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
GPT-3	May 2020	OpenAI	175B^[20]	300B tokens^[17]	3640^[21]	Proprietary	A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through ChatGPT in 2022.^[22]

2021

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
GPT-Neo	Mar 2021	EleutherAI	2.7B^[23]	825 GiB^[24]	Unknown	MIT^[25]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[25]
GPT-J	Jun 2021	EleutherAI	6B^[26]	825 GiB^[24]	200^[27]	Apache 2.0
Megatron-Turing NLG	Oct 2021^[28]	Microsoft and Nvidia	530B^[29]	338.6B tokens^[29]	38000^[30]	Unreleased	Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.^[30]
Ernie 3.0 Titan	Dec 2021	Baidu	260B^[31]	4TB	Unknown	Proprietary
Claude^[32]	Dec 2021	Anthropic	52B^[33]	400B tokens^[33]	Unknown	Proprietary	Fine-tuned for desirable behavior in conversations.^[34]
GLaM (Generalist Language Model)	Dec 2021	Google	1200B^[35]	1.6T tokens^[35]	5600^[35]	Proprietary
Gopher	Dec 2021	Google DeepMind	280B^[36]	300B tokens^[37]	5833^[38]	Proprietary

2022

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
LaMDA (Language Models for Dialog Applications)	Jan 2022	Google	137B^[39]	1.56T words,^[39] 168B tokens^[37]	4110^[40]	Proprietary
GPT-NeoX	Feb 2022	EleutherAI	20B^[41]	825 GiB^[24]	740^[27]	Apache 2.0
Chinchilla	Mar 2022	Google DeepMind	70B^[42]	1.4T tokens^[42]^[37]	6805^[38]	Proprietary
PaLM (Pathways Language Model)	Apr 2022	Google	540B^[43]	768B tokens^[42]	29,250^[38]	Proprietary	Trained for ~60 days on ~6000 TPU v4 chips.^[38]
OPT (Open Pretrained Transformer)	May 2022	Meta	175B^[44]	180B tokens^[45]	310^[27]	Non-commercial research^[d]	GPT-3 architecture with some adaptations from Megatron. The training logbook written by the team was published.^[46]
YaLM 100B	Jun 2022	Yandex	100B^[47]	1.7TB^[47]	Unknown	Apache 2.0
Minerva	Jun 2022	Google	540B^[48]	38.5B tokens from webpages filtered for math content and from arXiv^[48]	Unknown	Proprietary	For solving "mathematical and scientific questions using step-by-step reasoning".^[49]
BLOOM	Jul 2022	Large collaboration led by Hugging Face	175B^[50]	350B tokens (1.6TB)^[51]	Unknown	Responsible AI
Galactica	Nov 2022	Meta	120B	106B tokens^[52]	Unknown	CC-BY-NC-4.0
AlexaTM (Teacher Models)	Nov 2022	Amazon	20B^[53]	1.3T^[54]	Unknown	Proprietary^[55]

2023

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
Llama	Feb 2023	Meta AI	65B^[56]	1.4T^[56]	6300^[57]	Non-commercial research^[e]
GPT-4	Mar 2023	OpenAI	Unknown^[f] (According to rumors: 1760)^[59]	Unknown	Unknown, estimated 230,000	Proprietary
Cerebras-GPT	Mar 2023	Cerebras	13B^[60]		270^[27]	Apache 2.0
Falcon	Mar 2023	Technology Innovation Institute	40B^[61]	1T tokens, from RefinedWeb (filtered web text corpus)^[62] plus some "curated corpora".^[63]	2800^[57]	Apache 2.0^[64]
BloombergGPT	Mar 2023	Bloomberg L.P.	50B	363B tokens from Bloomberg's proprietary data sources, plus 345B tokens from general purpose datasets^[65]	Unknown	Unreleased	Designed for financial tasks.^[65]
PanGu-Σ	Mar 2023	Huawei	1085B	329B tokens^[66]	Unknown	Proprietary
OpenAssistant^[67]	Mar 2023	LAION	17B	1.5T tokens	Unknown	Apache 2.0
Jurassic-2^[68]^[69]	Mar 2023	AI21 Labs	Unknown	Unknown	Unknown	Proprietary
PaLM 2 (Pathways Language Model 2)	May 2023	Google	340B^[70]	3.6T tokens^[70]	85,000^[57]	Proprietary	Used in the Bard chatbot.^[71]
YandexGPT	May 17, 2023	Yandex	Unknown	Unknown	Unknown	Proprietary
Phi-1	Jun 21, 2023	Microsoft	1.3B^[72]	7B tokens^[72]	Unknown	MIT	Trained for 4 days on 8 A100s.^[72]
Llama 2	Jul 2023	Meta AI	70B^[73]	2T tokens^[73]	21,000	Llama 2	Trained over 3.3 million GPU (A100) hours.^[74]
Claude 2	Jul 2023	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in the Claude chatbot.^[75]
Granite 13b	Jul 2023	IBM	Unknown	Unknown	Unknown	Proprietary	Used in IBM Watsonx.^[76]
Mistral 7B	Sep 2023	Mistral AI	7.3B^[77]	Unknown	Unknown	Apache 2.0
YandexGPT 2	Sep 7, 2023	Yandex	Unknown	Unknown	Unknown	Proprietary
Claude 2.1	Nov 2023	Anthropic	Unknown	Unknown	Unknown	Proprietary	Used in the Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.^[78]
Grok-1^[79]	Nov 2023	xAI	314B	Unknown	Unknown	Apache 2.0	Used in the Grok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).^[80]
Gemini 1.0	Dec 2023	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Multimodal model, comes in three sizes. Used in the chatbot of the same name.^[81]
Mixtral 8x7B	Dec 2023	Mistral AI	46.7B	Unknown	Unknown	Apache 2.0	Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.^[82] Mixture of experts model, with 12.9 billion parameters activated per token.^[83]
DeepSeek-LLM	Nov 29, 2023	DeepSeek	67B	2T tokens^[84]^{: table 2}	12,000	DeepSeek	Trained on English and Chinese text. Used 10²⁴ training FLOPs for 67B model, 10^b FLOPs for 7B.^[84]^{: figure 5}
Phi-2	Dec 2023	Microsoft	2.7B	1.4T tokens	419^[85]	MIT	Trained on real and synthetic "textbook-quality" data over 14 days on 96 A100 GPUs.^[85]

2024

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	Training cost	License^[c]	Notes
Gemini 1.5	Feb 2024	Google DeepMind	Unknown	Unknown	Unknown	Proprietary	Multimodal model based on a MoE architecture. Context window above 1 million tokens.^[86]
Gemini Ultra	Feb 2024	Google DeepMind	Unknown	Unknown	Unknown	Proprietary
Gemma	Feb 2024	Google DeepMind	7B	6T tokens	Unknown	Gemma Terms of Use^[87]
OLMo	Feb 2024	Allen Institute for AI	7B^[88]	2T tokens^[89]	Unknown	Apache 2.0
Claude 3	Mar 2024	Anthropic	Unknown	Unknown	Unknown	Proprietary	Includes three models: Haiku, Sonnet, and Opus.^[90]
DBRX	Mar 2024	Databricks and Mosaic ML	136B	12T tokens	Unknown	Databricks Open Model^[91]^[92]
YandexGPT 3 Pro	Mar 28, 2024	Yandex	Unknown	Unknown	Unknown	Proprietary
Fugaku-LLM^[93]	May 2024	Fujitsu, Tokyo Institute of Technology, Tohoku University, RIKEN, etc.	13B	380B tokens	Unknown	Fugaku-LLM Terms of Use^[94]	The largest model ever trained on CPU-only, on the Fugaku supercomputer; the model was trained from scratch on 380 billion tokens using 13,824 Fugaku nodes.^[93]^[95]
Chameleon	May 2024	Meta AI	34B^[96]	4.4T	Unknown	Non-commercial research^[97]
Mixtral 8x22B^[98]	Apr 17, 2024	Mistral AI	141B	Unknown	Unknown	Apache 2.0
Phi-3	Apr 23, 2024	Microsoft	14B^[99]	4.8T tokens^[100]	Unknown	MIT	Marketed by Microsoft as a "small language model".^[99]
Granite Code Models	May 2024	IBM	Unknown	Unknown	Unknown	Apache 2.0
YandexGPT 3 Lite	May 28, 2024	Yandex	Unknown	Unknown	Unknown	Proprietary
Qwen2	Jun 2024	Alibaba Cloud	72B^[101]	3T tokens	Unknown	Various
DeepSeek-V2	Jun 2024	DeepSeek	236B	8.1T tokens	28,000	DeepSeek	1.4M hours on H800.^[102]
Nemotron-4	Jun 2024	Nvidia	340B	9T tokens	200,000	NVIDIA Open Model^[103]^[104]	Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.^[105]^[106]
Claude 3.5	Jun 2024	Anthropic	Unknown	Unknown	Unknown	Proprietary	Initially, only one model, Sonnet, was released.^[107] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.^[108]
Llama 3.1	Jul 2024	Meta AI	405B	15.6T tokens	440,000	Llama 3	405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.^[109]^[110]
Grok-2	Aug 14, 2024	xAI	Unknown	Unknown	Unknown	xAI Community License Agreement^[111]^[112]	Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.^[113]^[114]
OpenAI o1	Sep 12, 2024	OpenAI	Unknown	Unknown	Unknown	Proprietary	First LLM described as a "reasoning model".^[115]^[116]^{[better source needed]}
Sarvam-1	Oct 24, 2024	Sarvam AI	2B	~2T tokens	Unknown	Sarvam AI Research	Supports 10 Indic languages and English^[117]^[118]
YandexGPT 4 Lite and Pro	Oct 24, 2024	Yandex	Unknown	Unknown	Unknown	Proprietary
Mistral Large	Nov 2024	Mistral AI	123B	Unknown	Unknown	Mistral Research	Upgraded over time. The latest version is 24.11.^[119]
Pixtral	Nov 2024	Mistral AI	123B	Unknown	Unknown	Mistral Research	Multimodal. There is also a 12B version which is under Apache 2 license.^[119]
OLMo 2	Nov 2024	Allen Institute for AI	32B^[120]^[121]	6.6T tokens^[121]	15,000^[121]	Apache 2.0
Phi-4	Dec 12, 2024	Microsoft	14B^[122]	9.8T tokens	Unknown	MIT	Marketed by Microsoft as a "small language model".^[123]
DeepSeek-V3	Dec 2024	DeepSeek	671B	14.8T tokens	56,000	MIT	Used 2.788M training hours on H800 GPUs.^[124] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.^[125]
Amazon Nova	Dec 2024	Amazon	Unknown	Unknown	Unknown	Proprietary	Includes three models: Nova Micro, Nova Lite, and Nova Pro.^[126]

2025

edit

Name	Release date^[b]	Developer	Number of parameters	Corpus size	License^[c]	Notes
DeepSeek-R1	Jan 20	DeepSeek	671B	Not applicable	MIT	No pretraining; reinforcement-learned upon V3-Base.^[127]^[128]
Qwen2.5	Jan 26	Alibaba	72B	18T tokens	Various	7 dense models with parameter counts from 0.5B to 72B. Alibaba also released 2 MoE variants.^[129]
MiniMax-Text-01	Jan 14	Minimax	456B	4.7T tokens^[130]	Minimax Model	^[131]^[130]
Gemini 2.0	Feb 5	Google DeepMind	Unknown	Unknown	Proprietary	Three models released: Flash, Flash-Lite and Pro.^[132]^[133]^[134]
Grok 3	Feb 19	xAI	Unknown	Unknown	Proprietary	Training cost claimed to be "10x the compute of previous state-of-the-art models".^[135]
Claude 3.7	Feb 24	Anthropic	Unknown	Unknown	Proprietary	One model, Sonnet 3.7.^[136]
YandexGPT 5 Lite Pretrain and Pro	Feb 25	Yandex	Unknown	Unknown	Proprietary
GPT-4.5	Feb 27	OpenAI	Unknown	Unknown	Proprietary	OpenAI's largest non-reasoning model at the time.^[137]
Gemini 2.5	Mar 25	Google DeepMind	Unknown	Unknown	Proprietary	Three models released: Flash, Flash-Lite and Pro.^[138]
YandexGPT 5 Lite Instruct	Mar 31	Yandex	Unknown	Unknown	Proprietary
Llama 4	Apr 5	Meta AI	400B	40T tokens	Llama 4	^[139]^[140]
OpenAI o3 and o4-mini	Apr 16	OpenAI	Unknown	Unknown	Proprietary	Reasoning models.^[141]
Qwen3	Apr 28	Alibaba Cloud	235B	36T tokens	Apache 2.0	Multiple sizes, the smallest being 0.6B.^[142]
Claude 4	May 22	Anthropic	Unknown	Unknown	Proprietary	Includes two models, Sonnet and Opus.^[143]
Sarvam-M	May 23	Sarvam AI	24B	Unknown	Apache 2.0	Hybrid reasoning model fine-tuned on Mistral Small base; optimized for math, programming, and Indian languages.^[144]^[145]
Grok 4	Jul 9	xAI	Unknown	Unknown	Proprietary	^[146]
Param-1	Jul 21	BharatGen	2.9B^[147]	5T tokens^[g]^[147]	Apache 2.0	^[148]
GLM-4.5	Jul 29	Z.ai	355B	22T tokens^[149]^[h]	MIT	Released in 355B and 106B sizes.^[150]
GPT-OSS	Aug 5	OpenAI	117B	Unknown	Apache 2.0	Released in 20B and 120B sizes.^[151]
Claude 4.1	Aug 5	Anthropic	Unknown	Unknown	Proprietary	Includes one model, Opus.^[152]
GPT-5	Aug 7	OpenAI	Unknown	Unknown	Proprietary	Includes three models: GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes reasoning abilities. ^[153]^[154]
DeepSeek-V3.1	Aug 21	DeepSeek	671B	15.639T	MIT	Based on DeepSeek V3 (trained on 14.8T tokens); further trained on 839B tokens from the extension phases (630B + 209B).^[155] A hybrid model that can switch between thinking and non-thinking modes.^[156]
YandexGPT 5.1 Pro	Aug 28	Yandex	Unknown	Unknown	Proprietary
Apertus	Sep 2	ETH Zurich and EPF Lausanne	70B	15T^[157]	Apache 2.0	The first LLM to be compliant with the Artificial Intelligence Act of the European Union.^[158]
Claude Sonnet 4.5	Sep 29	Anthropic	Unknown	Unknown	Proprietary	^[159]
GLM-4.6	Sep 30	Z.ai	357B	Unknown	Apache 2.0	^[160]^[161]^[162]
Alice AI LLM 1.0	Oct 28	Yandex	Unknown	Unknown	Proprietary
Gemini 3	Nov 18	Google DeepMind	Unknown	Unknown	Proprietary	Models released: Deep Think and Pro.^[163]
Olmo 3^[164]	Nov 20	Allen Institute for AI	32B	5.9T tokens^[165]	Apache 2.0	Includes 7B and 32B parameter versions, alongside reasoning and instruction-following models.^[165]
Claude Opus 4.5	Nov 24	Anthropic	Unknown	Unknown	Proprietary	Largest model in the Claude family.^[166]
DeepSeek-V3.2	Dec 1	DeepSeek	685B	Unknown	MIT	Uses a custom DeepSeek Sparse Attention (DSA) mechanism^[167]^[168]^[169]
GPT 5.2	Dec 11	OpenAI	Unknown	Unknown	Proprietary	It was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.^[170]
GLM-4.7	Dec 22	Z.ai	355B	Unknown	Apache 2.0

2026

edit


Name	Release date^[b]	Developer	Number of parameters	Corpus size	License^[c]	Notes
Qwen3-Max-Thinking	Jan 26	Alibaba Cloud	Unknown	Unknown	Proprietary	Proprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.^[171]
Kimi K2.5	Jan 27	Moonshot AI	1040B	15T tokens	Modified MIT	Multimodal MoE with 32B active parameters, derived from Kimi K2.^[172] Can use "Agent Swarm" technology to coordinate up to 100 parallel sub-agents.^[173]^[174]
Step-3.5-Flash	Feb 12	StepFun	196B	Unknown	Apache 2.0	MoE model with 11B active parameters out of 196B total^[175]^[176]^[177]
Claude Opus 4.6	Feb 5	Anthropic	Unknown	Unknown	Proprietary
GPT-5.3-Codex	Feb 5	OpenAI	Unknown	Unknown	Proprietary
GLM-5	Feb 12	Z.ai	754B	Unknown	MIT
Claude Sonnet 4.6	Feb 17	Anthropic	Unknown	Unknown	Proprietary
Param-2	Feb 17	BharatGen	17B	~22T tokens	BharatGen Research^[178]	Mixture-of-experts model, successor of Param-1; many more Indic languages are supported. Trained on H100 GPUs for 24 days.^[179]
Sarvam-105B	Feb 18^[i]	Sarvam AI	105B^[181]	12T tokens^[181]	Apache 2.0	India's first independently-trained foundation model; has 105B and 30B versions. Based on mixture-of-experts model, using only 10.3B active parameters at a time.^[182] Interprets Indic languages and Hinglish.^[183]^[184]
Sarvam-30B	Feb 18^[i]	Sarvam AI	30B^[181]	16T tokens^[181]	Apache 2.0
GPT-5.4	Mar 5	OpenAI	Unknown	Unknown	Proprietary
Mistral Small 4	Mar 17	Mistral AI	119B	Unknown	Apache 2.0	MoE model with 6B active parameters out of 119B total^[185]^[186]
MiMo-V2-Pro	Mar 18	Xiaomi	1000B^[187]	Unknown	Proprietary	Mixture-of-experts (MoE) model with more than 1 trillion parameters (43 billion active). Designed for agentic scenarios. Initially available on OpenRouter under the codename "Hunter Alpha" before official release.^[188]
Gemma 4	Apr 2	Google DeepMind	31B	Unknown	Apache 2.0	Released in 31B, 26B A4B (3.8 billion active parameters), E4B (4 billion effective parameters), and E2B variants^[189]^[190]
GLM-5.1	Apr 7	Z.ai	754B	Unknown	MIT	MoE model designed for agentic coding^[191]^[192]
Muse Spark	Apr 8	Meta Superintelligence Labs	Unknown	Unknown	Proprietary	^[193]
Qwen3.6 (Qwen3.6-35B-A3B)	Apr 15	Alibaba Cloud	35B	Unknown	Apache 2.0	MoE model with 3B active parameters out of 35B total^[194]^[195]
Claude Opus 4.7	Apr 16	Anthropic	Unknown	Unknown	Proprietary
GPT-5.5	Apr 23	OpenAI	Unknown	Unknown	Proprietary
DeepSeek-V4-Flash	Apr 24	DeepSeek	284B	32T	MIT	Preview release^[196]
DeepSeek-V4-Pro	Apr 24	DeepSeek	1.6T	32T	MIT	Preview release^[196]
MiMo-V2.5-Pro	Apr 27	Xiaomi	1.02T	48T	MIT	MoE model designed for agentic coding and long-horizon software engineering tasks.^[197]^[198]
MiMo-V2.5	Apr 27	Xiaomi	310B	27T	MIT	Omni-modal MoE model with agentic capabilities and 1M-token context.^[199]
Gemini 3.5 Flash	May 19	Google DeepMind	Unknown	Unknown	Proprietary	^[200]
Claude Opus 4.8	May 28	Anthropic	Unknown	Unknown	Proprietary	^[201]
Step 3.7 Flash	May 29	StepFun	198B^[j]	Unknown	Apache 2.0	^[202]

Notes

edit

^ In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ This is the date that documentation describing the model's architecture was first released.
^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.
^ The smaller models including 66B are publicly available, while the 175B model is available on request.
^ Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
^ As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[58]
^ "focus[ed] on India’s linguistic landscape"
^ Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.
^ An early checkpoint of the model was released in January.^[180]
^ 196B + 1.8B (ViT)

References

edit

^ "Improving language understanding with unsupervised learning". openai.com. June 11, 2018. Archived from the original on 2023-03-18. Retrieved 2023-03-18.
^ "finetune-transformer-lm". GitHub. Archived from the original on 19 May 2023. Retrieved 2 January 2024.
^ Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". OpenAI. Retrieved 18 November 2025.
^ ^a ^b ^c Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
^ Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". The Next Platform. Archived from the original on 2023-06-20. Retrieved 2023-06-20.
^ "BERT". March 13, 2023. Archived from the original on January 13, 2021. Retrieved March 13, 2023 – via GitHub.
^ Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870. Archived from the original on 2023-11-17. Retrieved 2023-03-09.
^ Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris (2022). "Bidirectional Language Models Are Also Few-shot Learners". arXiv:2209.14500 [cs.LG].
^ ^a ^b Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research. 21 (140): 1–67. arXiv:1910.10683. ISSN 1533-7928.
^ google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, archived from the original on 2024-03-29, retrieved 2024-04-04
^ "Imagen: Text-to-Image Diffusion Models". imagen.research.google. Archived from the original on 2024-03-27. Retrieved 2024-04-04.
^ "Pretrained models — transformers 2.0.0 documentation". huggingface.co. Archived from the original on 2024-08-05. Retrieved 2024-08-05.
^ "xlnet". GitHub. Archived from the original on 2 January 2024. Retrieved 2 January 2024.
^ Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].
^ "GPT-2: 1.5B Release". OpenAI. 2019-11-05. Archived from the original on 2019-11-14. Retrieved 2019-11-14.
^ "Better language models and their implications". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-13.
^ ^a ^b "OpenAI's GPT-3 Language Model: A Technical Overview". lambdalabs.com. 3 June 2020. Archived from the original on 27 March 2023. Retrieved 13 March 2023.
^ ^a ^b "openai-community/gpt2-xl · Hugging Face". huggingface.co. Archived from the original on 2024-07-24. Retrieved 2024-07-24.
^ "gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.
^ Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. Archived from the original on 16 March 2023. Retrieved 9 March 2023.
^ Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].
^ "ChatGPT: Optimizing Language Models for Dialogue". OpenAI. 2022-11-30. Archived from the original on 2022-11-30. Retrieved 2023-01-13.
^ "GPT Neo". March 15, 2023. Archived from the original on March 12, 2023. Retrieved March 12, 2023 – via GitHub.
^ ^a ^b ^c Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].
^ ^a ^b Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. Archived from the original on 9 March 2023. Retrieved 13 March 2023.
^ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". www.forefront.ai. Archived from the original on 2023-03-09. Retrieved 2023-02-28.
^ ^a ^b ^c ^d Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].
^ Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". Microsoft Research. Archived from the original on 13 March 2023. Retrieved 13 March 2023.
^ ^a ^b Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].
^ ^a ^b Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, arXiv:2201.05596
^ Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].
^ "Product". Anthropic. Archived from the original on 16 March 2023. Retrieved 14 March 2023.
^ ^a ^b Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].
^ Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].
^ ^a ^b ^c Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". ai.googleblog.com. Archived from the original on 2023-03-12. Retrieved 2023-03-09.
^ "Language modelling at scale: Gopher, ethical considerations, and retrieval". www.deepmind.com. 8 December 2021. Archived from the original on 20 March 2023. Retrieved 20 March 2023.
^ ^a ^b ^c Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].
^ ^a ^b ^c ^d Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways Archived 2023-06-10 at the Wayback Machine
^ ^a ^b Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". ai.googleblog.com. Archived from the original on 2022-03-25. Retrieved 2023-03-09.
^ Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].
^ Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. (2022-05-01). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Vol. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. Archived from the original on 2022-12-10. Retrieved 2022-12-19.
^ ^a ^b ^c Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. Archived from the original on 13 April 2022. Retrieved 9 March 2023.
^ Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Archived from the original on 2022-04-04. Retrieved 2023-03-09.
^ Susan Zhang; Mona Diab; Luke Zettlemoyer. "Democratizing access to large-scale language models with OPT-175B". ai.facebook.com. Archived from the original on 2023-03-12. Retrieved 2023-03-12.
^ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
^ "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq". GitHub. Retrieved 2024-10-18.
^ ^a ^b Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, archived from the original on 2023-06-16, retrieved 2023-03-18
^ ^a ^b Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].
^ "Minerva: Solving Quantitative Reasoning Problems with Language Models". ai.googleblog.com. 30 June 2022. Retrieved 20 March 2023.
^ Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature. 615 (7951): 202–205. Bibcode:2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. S2CID 257380916. Archived from the original on 16 March 2023. Retrieved 9 March 2023.
^ "bigscience/bloom · Hugging Face". huggingface.co. Archived from the original on 2023-04-12. Retrieved 2023-03-13.
^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
^ "20B-parameter Alexa model sets new marks in few-shot learning". Amazon Science. 2 August 2022. Archived from the original on 15 March 2023. Retrieved 12 March 2023.
^ Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].
^ "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". aws.amazon.com. 17 November 2022. Archived from the original on 13 March 2023. Retrieved 13 March 2023.
^ ^a ^b "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. Archived from the original on 3 March 2023. Retrieved 9 March 2023.
^ ^a ^b ^c "The Falcon has landed in the Hugging Face ecosystem". huggingface.co. Archived from the original on 2023-06-20. Retrieved 2023-06-20.
^ "GPT-4 Technical Report" (PDF). OpenAI. 2023. Archived (PDF) from the original on March 14, 2023. Retrieved March 14, 2023.
^ Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked". THE DECODER. Archived from the original on 2023-07-12. Retrieved 2024-07-26.
^ Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". Cerebras. Archived from the original on March 28, 2023. Retrieved March 28, 2023.
^ "Abu Dhabi-based TII launches its own version of ChatGPT". tii.ae. Archived from the original on 2023-04-03. Retrieved 2023-04-03.
^ Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].
^ "tiiuae/falcon-40b · Hugging Face". huggingface.co. 2023-06-09. Retrieved 2023-06-20.
^ UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free Archived 2024-02-08 at the Wayback Machine, 31 May 2023
^ ^a ^b Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].
^ Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].
^ Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].
^ Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". The Times of Israel. ISSN 0040-7909. Archived from the original on 2023-07-24. Retrieved 2023-07-24.
^ Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". TechCrunch. Archived from the original on 2023-07-24. Retrieved 2023-07-24.
^ ^a ^b Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. Archived from the original on 16 May 2023. Retrieved 18 May 2023.
^ "Introducing PaLM 2". Google. May 10, 2023. Archived from the original on May 18, 2023. Retrieved May 18, 2023.
^ ^a ^b ^c Gunasekar, Suriya; Zhang, Yi; Aneja, Jyoti; Caio César Teodoro Mendes; Allie Del Giorno; Gopi, Sivakanth; Javaheripi, Mojan; Kauffmann, Piero; Gustavo de Rosa; Saarikivi, Olli; Salim, Adil; Shah, Shital; Harkirat Singh Behl; Wang, Xin; Bubeck, Sébastien; Eldan, Ronen; Adam Tauman Kalai; Yin Tat Lee; Li, Yuanzhi (2023). "Textbooks Are All You Need". arXiv:2306.11644 [cs.CL].
^ ^a ^b "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". Meta AI. 2023. Archived from the original on 2024-01-05. Retrieved 2023-07-19.
^ "llama/MODEL_CARD.md at main · meta-llama/llama". GitHub. Archived from the original on 2024-05-28. Retrieved 2024-05-28.
^ "Claude 2". anthropic.com. Archived from the original on 15 December 2023. Retrieved 12 December 2023.
^ Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models". IBM Blog. Archived from the original on 2024-07-22. Retrieved 2024-08-11.
^ "Announcing Mistral 7B". Mistral. 2023. Archived from the original on 2024-01-06. Retrieved 2023-10-06.
^ "Introducing Claude 2.1". anthropic.com. Archived from the original on 15 December 2023. Retrieved 12 December 2023.
^ xai-org/grok-1, xai-org, 2024-03-19, archived from the original on 2024-05-28, retrieved 2024-03-19
^ "Grok-1 model card". x.ai. Retrieved 12 December 2023.
^ "Gemini – Google DeepMind". deepmind.google. Archived from the original on 8 December 2023. Retrieved 12 December 2023.
^ Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". VentureBeat. Archived from the original on 11 December 2023. Retrieved 12 December 2023.
^ "Mixtral of experts". mistral.ai. 11 December 2023. Archived from the original on 13 February 2024. Retrieved 12 December 2023.
^ ^a ^b DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, arXiv:2401.02954
^ ^a ^b Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". Microsoft Research. Archived from the original on 12 December 2023. Retrieved 13 December 2023.
^ "Our next-generation model: Gemini 1.5". Google. 15 February 2024. Archived from the original on 16 February 2024. Retrieved 16 February 2024. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we've also successfully tested up to 10 million tokens.
^ "Gemma" – via GitHub.
^ "OLMo: Open Language Model | Ai2". allenai.org. Retrieved 2026-03-17.
^ Groeneveld, Dirk; Beltagy, Iz; Walsh, Pete; Bhagia, Akshita; Kinney, Rodney; Tafjord, Oyvind; Jha, Ananya Harsh; Ivison, Hamish; Magnusson, Ian (2024-06-07), OLMo: Accelerating the Science of Language Models, arXiv, doi:10.48550/arXiv.2402.00838, arXiv:2402.00838, retrieved 2026-03-17
^ "Introducing the next generation of Claude". www.anthropic.com. Archived from the original on 2024-03-04. Retrieved 2024-03-04.
^ "Databricks Open Model License". Databricks. 27 March 2024. Retrieved 6 August 2025.
^ "Databricks Open Model Acceptable Use Policy". Databricks. 27 March 2024. Retrieved 6 August 2025.
^ ^a ^b "Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"". Fujitsu. 10 May 2024. Retrieved 20 April 2026.
^ "Fugaku-LLM Terms of Use". 23 April 2024. Retrieved 6 August 2025 – via Hugging Face.
^ "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". huggingface.co. Archived from the original on 2024-05-17. Retrieved 2024-05-17.
^ Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat.
^ "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon". Meta Research. Retrieved 6 August 2025 – via GitHub.
^ AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". mistral.ai. Archived from the original on 2024-05-05. Retrieved 2024-05-05.
^ ^a ^b Bilenko, Misha (23 April 2024). "Introducing Phi-3: Redefining what's possible with SLMs". azure.microsoft.com. Archived from the original on 8 May 2026. Retrieved 8 May 2026.
^ Abdin, Marah; et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". arXiv:2404.14219 [cs.CL].
^ "Qwen2". GitHub. Archived from the original on 2024-06-17. Retrieved 2024-06-17.
^ DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, arXiv:2405.04434
^ "NVIDIA Open Models License". Nvidia. 16 June 2025. Retrieved 6 August 2025.
^ "Trustworthy AI". Nvidia. 27 June 2024. Retrieved 6 August 2025.
^ "nvidia/Nemotron-4-340B-Base · Hugging Face". huggingface.co. 2024-06-14. Archived from the original on 2024-06-15. Retrieved 2024-06-15.
^ "Nemotron-4 340B | Research". research.nvidia.com. Archived from the original on 2024-06-15. Retrieved 2024-06-15.
^ "Introducing Claude 3.5 Sonnet". www.anthropic.com. Retrieved 8 August 2025.
^ "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku". www.anthropic.com. Retrieved 8 August 2025.
^ "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
^ "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models". GitHub. Archived from the original on 2024-07-23. Retrieved 2024-07-23.
^ "LICENSE · xai-org/grok-2 at main". 5 November 2025. Retrieved 18 November 2025 – via Hugging Face.
^ "xAI Acceptable Use Policy". xAI. 2 January 2025. Retrieved 18 November 2025.
^ Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". The Verge. Retrieved 18 November 2025.
^ Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". TechCrunch. Retrieved 18 November 2025.
^ "Introducing OpenAI o1". openai.com. Retrieved 8 August 2025.
^ Paul, Katie; Tong, Anna (13 September 2024). "OpenAI launches new series of AI models with 'reasoning' abilities". Reuters.
^ Jindal, Siddharth (24 October 2024). "Sarvam AI Launches Sarvam-1, Outperforms Gemma-2 and Llama-3.2". Analytics India Magazine. Archived from the original on 25 July 2025. Retrieved 20 April 2026.
^ "LICENSE.md · sarvamai/sarvam-1". 23 October 2024. Retrieved 20 April 2026 – via Hugging Face.
^ ^a ^b "Models Overview". mistral.ai. Retrieved 2025-03-03.
^ "OLMo 2: The best fully open language model to date | Ai2". allenai.org. Retrieved 2026-03-17.
^ ^a ^b ^c OLMo, Team; Walsh, Pete; Soldaini, Luca; Groeneveld, Dirk; Lo, Kyle; Arora, Shane; Bhagia, Akshita; Gu, Yuling; Huang, Shengyi (2025-10-08), 2 OLMo 2 Furious, arXiv, doi:10.48550/arXiv.2501.00656, arXiv:2501.00656, retrieved 2026-03-17
^ "Phi-4 Model Card". huggingface.co. Retrieved 2025-11-11.{{cite web}}: CS1 maint: url-status (link)
^ "Introducing Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning". techcommunity.microsoft.com. Retrieved 2025-11-11.{{cite web}}: CS1 maint: url-status (link)
^ deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, retrieved 2024-12-26
^ Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model". South China Morning Post. Retrieved 6 April 2025.
^ Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, retrieved 2024-12-27
^ deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, retrieved 2025-01-21
^ DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv:2501.12948
^ Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng (2025-01-03), Qwen2.5 Technical Report, arXiv:2412.15115
^ ^a ^b MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention, arXiv:2501.08313
^ MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, retrieved 2025-01-26
^ Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". Google. Retrieved 6 February 2025.
^ "Gemini 2.0: Flash, Flash-Lite and Pro". Google for Developers. Retrieved 6 February 2025.
^ Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. Retrieved 6 February 2025.
^ "Grok 3 Beta — The Age of Reasoning Agents". x.ai. Retrieved 2025-02-22.
^ "Claude 3.7 Sonnet and Claude Code". www.anthropic.com. Retrieved 8 August 2025.
^ "Introducing GPT-4.5". openai.com. Retrieved 8 August 2025.
^ Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". Google. Retrieved 23 September 2025.
^ "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". huggingface.co. 2025-04-05. Retrieved 2025-04-06.
^ "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation". ai.meta.com. Archived from the original on 2025-04-05. Retrieved 2025-04-05.
^ "Introducing OpenAI o3 and o4-mini". openai.com. Retrieved 8 August 2025.
^ Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster". Qwen. Retrieved 2025-04-29.
^ "Introducing Claude 4". www.anthropic.com. Retrieved 8 August 2025.
^ Yadav, Nandini (2025-05-26). "Indian AI startup launches Sarvam-M model: What is it, why is everyone talking about it". India Today. Retrieved 2026-03-18.
^ "Sarvam-M: Open Source Hybrid Indic LLM | Sarvam AI". Sarvam AI. 2025-05-23. Retrieved 2026-03-18.
^ "Grok 4". x.ai. 9 July 2025. Archived from the original on 9 May 2026. Retrieved 9 May 2026.
^ ^a ^b Pundalik, Kundeshwar; Sawarkar, Piyush; Sahoo, Nihar; Shinde, Abhishek; Chanda, Prateek; Goswami, Vedant; Nagpal, Ajay; Singh, Atul; Thakur, Viraj (2025-07-16), PARAM-1 BharatGen 2.9B Model, arXiv, doi:10.48550/arXiv.2507.13390, arXiv:2507.13390, retrieved 2026-03-18
^ "README.md · bharatgenai/Param-1". 24 February 2026. Retrieved 12 April 2026 – via Hugging Face.
^ "GLM-4.5: Reasoning, Coding, and Agentic Abililties". z.ai. Retrieved 2025-08-06.
^ "zai-org/GLM-4.5 · Hugging Face". huggingface.co. 2025-08-04. Retrieved 2025-08-06.
^ Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today". Ars Technica. Retrieved 6 August 2025.
^ "Claude Opus 4.1". www.anthropic.com. Retrieved 8 August 2025.
^ "Introducing GPT-5". openai.com. 7 August 2025. Retrieved 8 August 2025.
^ "OpenAI Platform: GPT-5 Model Documentation". openai.com. Retrieved 18 August 2025.
^ "deepseek-ai/DeepSeek-V3.1 · Hugging Face". huggingface.co. 2025-08-21. Retrieved 2025-08-25.
^ "DeepSeek-V3.1 Release | DeepSeek API Docs". api-docs.deepseek.com. Retrieved 2025-08-25.
^ "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in German). Zürich: ETH Zürich. 2025-09-02. Retrieved 2025-11-07.
^ Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor". heise online (in German). Retrieved 2025-11-07.
^ "Introducing Claude Sonnet 4.5". www.anthropic.com. Retrieved 29 September 2025.
^ "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities". z.ai. Retrieved 2025-10-01.
^ "zai-org/GLM-4.6 · Hugging Face". huggingface.co. 2025-09-30. Retrieved 2025-10-01.
^ "GLM-4.6". modelscope.cn. Retrieved 2025-10-01.
^ "A new era of intelligence with Gemini 3". Google. 18 November 2025. Retrieved 5 January 2026.
^ "Olmo 3: Charting a path through the model flow to lead open-source AI". Ai2. 20 November 2025.
^ ^a ^b Olmo, Team; Ettinger, Allyson; Bertsch, Amanda; Kuehl, Bailey; Graham, David; Heineman, David; Groeneveld, Dirk; Brahman, Faeze; Timbers, Finbarr (2025-12-15), Olmo 3, arXiv, doi:10.48550/arXiv.2512.13961, arXiv:2512.13961, retrieved 2026-03-17
^ "Introducing Claude Opus 4.5". www.anthropic.com. Retrieved 8 January 2026.
^ Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it". Mashable. Retrieved 12 April 2026.
^ "DeepSeek-V3.2 Release". DeepSeek API Docs. 1 December 2025. Retrieved 12 April 2026.
^ "DeepSeek-V3.2: Efficient Reasoning & Agentic AI". Hugging Face. 1 December 2025. Retrieved 12 April 2026.
^ "Advancing science and math with GPT-5.2". openai.com. Retrieved 4 January 2026.
^ "Pushing Qwen3-Max-Thinking Beyond its Limits". Qwen. 25 January 2026. Archived from the original on 6 February 2026. Retrieved 6 February 2026. We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities [...]; and (2) advanced test-time scaling techniques [...]. [...] We limit [parallel trajectories] and redirect saved computation to iterative self-reflection guided by a "take-experience" mechanism.
^ Team, Kimi; Bai, Yifan; Bao, Yiping; Charles, Y.; Chen, Cheng; Chen, Guanduo; Chen, Haiting; Chen, Huarong; Chen, Jiahao (2026-02-03), Kimi K2: Open Agentic Intelligence, arXiv, doi:10.48550/arXiv.2507.20534, arXiv:2507.20534, retrieved 2026-03-18
^ Team, Kimi; Bai, Tongtong; Bai, Yifan; Bao, Yiping; Cai, S. H.; Cao, Yuan; Charles, Y.; Che, H. S.; Chen, Cheng (2026-02-02), Kimi K2.5: Visual Agentic Intelligence, arXiv, doi:10.48550/arXiv.2602.02276, arXiv:2602.02276, retrieved 2026-03-18
^ "Kimi K2.5: Chat with Kimi K2.5 for Free". Kimi K2.5. Retrieved 2026-03-18.
^ Jiang, Ben (3 February 2026). "Compact AI model from China's StepFun outshines rivals from DeepSeek, Moonshot". South China Morning Post. Archived from the original on 4 February 2026. Retrieved 14 April 2026.
^ "Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act". StepFun. 12 February 2026. Retrieved 20 April 2026.
^ "stepfun-ai/Step-3.5-Flash". 14 March 2026. Retrieved 14 April 2026 – via Hugging Face.
^ "LICENSE · bharatgenai/Param2-17B-A2.4B-Thinking". 16 February 2026. Retrieved 12 April 2026 – via Hugging Face.
^ "bharatgenai/Param2-17B-A2.4B-Thinking". Retrieved 2026-03-08 – via Hugging Face.
^ "sarvamai/sarvam-1-v0.5 · Hugging Face". huggingface.co. Retrieved 2026-03-08.
^ ^a ^b ^c ^d "Open-Sourcing Sarvam 30B and 105B". Sarvam AI. 6 March 2026. Archived from the original on 8 May 2026. Retrieved 8 May 2026.
^ "sarvamai/sarvam-105b · Hugging Face". huggingface.co. Retrieved 2026-03-08.
^ Kumar, Abhijeet (19 February 2026). "Why Sarvam's new 105B model marks a shift in India's sovereign AI ambitions". Business Standard.
^ Singh, Jagmeet (2026-02-18). "Indian AI lab Sarvam's new models are a major bet on the viability of open source AI". TechCrunch. Retrieved 2026-03-18.
^ Marquez, Javier (17 March 2026). "Una IA para reunir todas las funciones posibles: la apuesta de Mistral con Small 4 es hacer más con menos cosas" [An AI to bring together all possible functions: Mistral's bet with Small 4 is to do more with less]. Xataka (in Spanish). Retrieved 20 April 2026.
^ "Introducing Mistral Small 4". Mistral AI. Retrieved 20 April 2026.
^ "Xiaomi Launches Powerful AI Model MiMo-V2 Pro With 1 Trillion Parametres, 1 Million Token Context Window". NDTV Profit. 19 March 2026.
^ "Mystery AI model revealed to be Xiaomi's following suspicions it was DeepSeek's". Reuters. 18 March 2026. Retrieved 3 April 2026.
^ Whitwam, Ryan (2 April 2026). "Google announces Gemma 4 open AI models, switches to Apache 2.0 license". Ars Technica. Retrieved 3 April 2026.
^ Mann, Tobias (2 April 2026). "Google battles Chinese open weights models with Gemma 4". Retrieved 3 April 2026.
^ Franzen, Carl (7 April 2026). "AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro". VentureBeat. Retrieved 12 April 2026.
^ "GLM-5.1: Towards Long-Horizon Tasks". Z.ai. Retrieved 12 April 2026.
^ "Introducing Muse Spark: Scaling Towards Personal Superintelligence". ai.meta.com. 8 April 2026. Archived from the original on 9 May 2026. Retrieved 9 May 2026.
^ "A Chinese AI called 'Qwen3.6-35B-A3B,' which is more powerful than Gemma4, has been released as an open model". Gigazine [ja]. 17 April 2026. Retrieved 17 April 2026.
^ "README.md · Qwen/Qwen3.6-35B-A3B". 15 April 2026. Retrieved 17 April 2026 – via Hugging Face.
^ Butts, Dylan (24 April 2026). "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies". CNBC.
^ "MiMo-V2.5-Pro | Xiaomi". mimo.xiaomi.com. Retrieved 2026-05-03.
^ Thomas, Prasanth Aby (28 April 2026). "Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents". Computerworld. Retrieved 6 May 2026.
^ "XiaomiMiMo/MiMo-V2.5". Hugging Face. XiaomiMiMo. Retrieved 3 May 2026.
^ "Gemini 3.5: frontier intelligence with action". Google. 19 May 2026.
^ "Introducing Claude Opus 4.8". Anthropic. 28 May 2026. Archived from the original on 30 May 2026. Retrieved 30 May 2026.
^ "Step 3.7 Flash". StepFun. 29 May 2026. Archived from the original on 30 May 2026. Retrieved 30 May 2026.

[1] In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.

[release_date_note-2] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ This is the date that documentation describing the model's architecture was first released.

[license_note-3] ^ ^a ^b ^c ^d ^e ^f ^g ^h ⁱ This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, see List of chatbots.

[49] The smaller models including 66B are publicly available, while the 175B model is available on request.

[62] Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.

[64] As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."^[58]

[154] "focus[ed] on India’s linguistic landscape"

[157] Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.

[189] An early checkpoint of the model was released in January.^[180]

[211] 196B + 1.8B (ViT)

[oai-unsup-4] "Improving language understanding with unsupervised learning". openai.com. June 11, 2018. Archived from the original on 2023-03-18. Retrieved 2023-03-18.

[5] "finetune-transformer-lm". GitHub. Archived from the original on 19 May 2023. Retrieved 2 January 2024.

[6] Radford, Alec (11 June 2018). "Improving language understanding with unsupervised learning". OpenAI. Retrieved 18 November 2025.

[bert-paper-7] Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].

[bHZJ2-8] Prickett, Nicole Hemsoth (2021-08-24). "Cerebras Shifts Architecture To Meet Massive AI/ML Models". The Next Platform. Archived from the original on 2023-06-20. Retrieved 2023-06-20.

[bert-web-9] "BERT". March 13, 2023. Archived from the original on January 13, 2021. Retrieved March 13, 2023 – via GitHub.

[Manning-2022-10] Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus. 151 (2): 127–138. doi:10.1162/daed_a_01905. S2CID 248377870. Archived from the original on 2023-11-17. Retrieved 2023-03-09.

[Ir545-11] Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris (2022). "Bidirectional Language Models Are Also Few-shot Learners". arXiv:2209.14500 [cs.LG].

[:6-12] Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". Journal of Machine Learning Research. 21 (140): 1–67. arXiv:1910.10683. ISSN 1533-7928.

[13] google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02, archived from the original on 2024-03-29, retrieved 2024-04-04

[14] "Imagen: Text-to-Image Diffusion Models". imagen.research.google. Archived from the original on 2024-03-27. Retrieved 2024-04-04.

[15] "Pretrained models — transformers 2.0.0 documentation". huggingface.co. Archived from the original on 2024-08-05. Retrieved 2024-08-05.

[xlnet-16] "xlnet". GitHub. Archived from the original on 2 January 2024. Retrieved 2 January 2024.

[LX3rI-17] Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237 [cs.CL].

[15Brelease-18] "GPT-2: 1.5B Release". OpenAI. 2019-11-05. Archived from the original on 2019-11-14. Retrieved 2019-11-14.

[5T8u5-19] "Better language models and their implications". openai.com. Archived from the original on 2023-03-16. Retrieved 2023-03-13.

[LambdaLabs-20] "OpenAI's GPT-3 Language Model: A Technical Overview". lambdalabs.com. 3 June 2020. Archived from the original on 27 March 2023. Retrieved 13 March 2023.

[:10-21] "openai-community/gpt2-xl · Hugging Face". huggingface.co. Archived from the original on 2024-07-24. Retrieved 2024-07-24.

[Sudbe-22] "gpt-2". GitHub. Archived from the original on 11 March 2023. Retrieved 13 March 2023.

[Wiggers-23] Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch. Archived from the original on 16 March 2023. Retrieved 9 March 2023.

[:2-24] Table D.1 in Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners". arXiv:2005.14165v4 [cs.CL].

[chatgpt-blog-25] "ChatGPT: Optimizing Language Models for Dialogue". OpenAI. 2022-11-30. Archived from the original on 2022-11-30. Retrieved 2023-01-13.

[gpt-neo-26] "GPT Neo". March 15, 2023. Archived from the original on March 12, 2023. Retrieved March 12, 2023 – via GitHub.

[Pile-27] Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling". arXiv:2101.00027 [cs.CL].

[vb-gpt-neo-28] Iyer, Abhishek (15 May 2021). "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. Archived from the original on 9 March 2023. Retrieved 13 March 2023.

[JxohJ-29] "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". www.forefront.ai. Archived from the original on 2023-03-09. Retrieved 2023-02-28.

[:3-30] Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster". arXiv:2304.03208 [cs.LG].

[BwnW5-31] Alvi, Ali; Kharya, Paresh (11 October 2021). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model". Microsoft Research. Archived from the original on 13 March 2023. Retrieved 13 March 2023.

[mtnlg-preprint-32] Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990 [cs.CL].

[:11-33] Rajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21), DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale, arXiv:2201.05596

[qeOB8-34] Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731 [cs.CL].

[i8jc4-35] "Product". Anthropic. Archived from the original on 16 March 2023. Retrieved 14 March 2023.

[AnthroArch-36] Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861 [cs.CL].

[RZqhw-37] Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback". arXiv:2212.08073 [cs.CL].

[glam-blog-38] Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". ai.googleblog.com. Archived from the original on 2023-03-12. Retrieved 2023-03-09.

[mD5eE-39] "Language modelling at scale: Gopher, ethical considerations, and retrieval". www.deepmind.com. 8 December 2021. Archived from the original on 20 March 2023. Retrieved 20 March 2023.

[hoffman-40] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556 [cs.CL].

[:4-41] Table 20 and page 66 of PaLM: Scaling Language Modeling with Pathways Archived 2023-06-10 at the Wayback Machine

[lamda-blog-42] Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". ai.googleblog.com. Archived from the original on 2022-03-25. Retrieved 2023-03-09.

[DMs9Z-43] Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239 [cs.CL].

[gpt-neox-20b-44] Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. (2022-05-01). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Vol. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. Archived from the original on 2022-12-10. Retrieved 2022-12-19.

[chinchilla-blog-45] Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog. Archived from the original on 13 April 2022. Retrieved 9 March 2023.

[palm-blog-46] Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Archived from the original on 2022-04-04. Retrieved 2023-03-09.

[jlof8-47] Susan Zhang; Mona Diab; Luke Zettlemoyer. "Democratizing access to large-scale language models with OPT-175B". ai.facebook.com. Archived from the original on 2023-03-12. Retrieved 2023-03-12.

[QjTIc-48] Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].

[50] "metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq". GitHub. Retrieved 2024-10-18.

[yalm-repo-51] Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22), YaLM 100B, archived from the original on 2023-06-16, retrieved 2023-03-18

[minerva-paper-52] Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models". arXiv:2206.14858 [cs.CL].

[FfCNK-53] "Minerva: Solving Quantitative Reasoning Problems with Language Models". ai.googleblog.com. 30 June 2022. Retrieved 20 March 2023.

[bigger-better-54] Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature. 615 (7951): 202–205. Bibcode:2023Natur.615..202A. doi:10.1038/d41586-023-00641-w. PMID 36890378. S2CID 257380916. Archived from the original on 16 March 2023. Retrieved 9 March 2023.

[B8wB2-55] "bigscience/bloom · Hugging Face". huggingface.co. Archived from the original on 2023-04-12. Retrieved 2023-03-13.

[37sY6-56] Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].

[u5szh-57] "20B-parameter Alexa model sets new marks in few-shot learning". Amazon Science. 2 August 2022. Archived from the original on 15 March 2023. Retrieved 12 March 2023.

[HaA7l-58] Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model". arXiv:2208.01448 [cs.CL].

[rpehM-59] "AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog". aws.amazon.com. 17 November 2022. Archived from the original on 13 March 2023. Retrieved 13 March 2023.

[llama-blog-60] "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023. Archived from the original on 3 March 2023. Retrieved 9 March 2023.

[:5-61] "The Falcon has landed in the Hugging Face ecosystem". huggingface.co. Archived from the original on 2023-06-20. Retrieved 2023-06-20.

[GPT4Tech-63] "GPT-4 Technical Report" (PDF). OpenAI. 2023. Archived (PDF) from the original on March 14, 2023. Retrieved March 14, 2023.

[65] Schreiner, Maximilian (2023-07-11). "GPT-4 architecture, datasets, costs and more leaked". THE DECODER. Archived from the original on 2023-07-12. Retrieved 2024-07-26.

[D0k2a-66] Dey, Nolan (March 28, 2023). "Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models". Cerebras. Archived from the original on March 28, 2023. Retrieved March 28, 2023.

[falcon-67] "Abu Dhabi-based TII launches its own version of ChatGPT". tii.ae. Archived from the original on 2023-04-03. Retrieved 2023-04-03.

[Xb1gq-68] Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only". arXiv:2306.01116 [cs.CL].

[gzTNw-69] "tiiuae/falcon-40b · Hugging Face". huggingface.co. 2023-06-09. Retrieved 2023-06-20.

[Wmlcs-70] UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free Archived 2024-02-08 at the Wayback Machine, 31 May 2023

[nGOSu-71] Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance". arXiv:2303.17564 [cs.LG].

[9WSFw-72] Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing". arXiv:2303.10845 [cs.CL].

[JiOl8-73] Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment". arXiv:2304.07327 [cs.CL].

[74] Wrobel, Sharon. "Tel Aviv startup rolls out new advanced AI language model to rival OpenAI". The Times of Israel. ISSN 0040-7909. Archived from the original on 2023-07-24. Retrieved 2023-07-24.

[75] Wiggers, Kyle (2023-04-13). "With Bedrock, Amazon enters the generative AI race". TechCrunch. Archived from the original on 2023-07-24. Retrieved 2023-07-24.

[cnbc-20230516-76] Elias, Jennifer (16 May 2023). "Google's newest A.I. model uses nearly five times more text data for training than its predecessor". CNBC. Archived from the original on 16 May 2023. Retrieved 18 May 2023.

[pWyLA-77] "Introducing PaLM 2". Google. May 10, 2023. Archived from the original on May 18, 2023. Retrieved May 18, 2023.

[2306.11644-78] Gunasekar, Suriya; Zhang, Yi; Aneja, Jyoti; Caio César Teodoro Mendes; Allie Del Giorno; Gopi, Sivakanth; Javaheripi, Mojan; Kauffmann, Piero; Gustavo de Rosa; Saarikivi, Olli; Salim, Adil; Shah, Shital; Harkirat Singh Behl; Wang, Xin; Bubeck, Sébastien; Eldan, Ronen; Adam Tauman Kalai; Yin Tat Lee; Li, Yuanzhi (2023). "Textbooks Are All You Need". arXiv:2306.11644 [cs.CL].

[meta-20230719-79] "Introducing Llama 2: The Next Generation of Our Open Source Large Language Model". Meta AI. 2023. Archived from the original on 2024-01-05. Retrieved 2023-07-19.

[80] "llama/MODEL_CARD.md at main · meta-llama/llama". GitHub. Archived from the original on 2024-05-28. Retrieved 2024-05-28.

[81] "Claude 2". anthropic.com. Archived from the original on 15 December 2023. Retrieved 12 December 2023.

[82] Nirmal, Dinesh (2023-09-07). "Building AI for business: IBM's Granite foundation models". IBM Blog. Archived from the original on 2024-07-22. Retrieved 2024-08-11.

[mistral-20230927-83] "Announcing Mistral 7B". Mistral. 2023. Archived from the original on 2024-01-06. Retrieved 2023-10-06.

[84] "Introducing Claude 2.1". anthropic.com. Archived from the original on 15 December 2023. Retrieved 12 December 2023.

[85] xai-org/grok-1, xai-org, 2024-03-19, archived from the original on 2024-05-28, retrieved 2024-03-19

[86] "Grok-1 model card". x.ai. Retrieved 12 December 2023.

[87] "Gemini – Google DeepMind". deepmind.google. Archived from the original on 8 December 2023. Retrieved 12 December 2023.

[88] Franzen, Carl (11 December 2023). "Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance". VentureBeat. Archived from the original on 11 December 2023. Retrieved 12 December 2023.

[89] "Mixtral of experts". mistral.ai. 11 December 2023. Archived from the original on 13 February 2024. Retrieved 12 December 2023.

[:1-90] DeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai (2024-01-05), DeepSeek LLM: Scaling Open-Source Language Models with Longtermism, arXiv:2401.02954

[:9-91] Hughes, Alyssa (12 December 2023). "Phi-2: The surprising power of small language models". Microsoft Research. Archived from the original on 12 December 2023. Retrieved 13 December 2023.

[92] "Our next-generation model: Gemini 1.5". Google. 15 February 2024. Archived from the original on 16 February 2024. Retrieved 16 February 2024. This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we've also successfully tested up to 10 million tokens.

[gemma-93] "Gemma" – via GitHub.

[94] "OLMo: Open Language Model | Ai2". allenai.org. Retrieved 2026-03-17.

[95] Groeneveld, Dirk; Beltagy, Iz; Walsh, Pete; Bhagia, Akshita; Kinney, Rodney; Tafjord, Oyvind; Jha, Ananya Harsh; Ivison, Hamish; Magnusson, Ian (2024-06-07), OLMo: Accelerating the Science of Language Models, arXiv, doi:10.48550/arXiv.2402.00838, arXiv:2402.00838, retrieved 2026-03-17

[96] "Introducing the next generation of Claude". www.anthropic.com. Archived from the original on 2024-03-04. Retrieved 2024-03-04.

[97] "Databricks Open Model License". Databricks. 27 March 2024. Retrieved 6 August 2025.

[98] "Databricks Open Model Acceptable Use Policy". Databricks. 27 March 2024. Retrieved 6 August 2025.

[FugakuLLMRelease-99] "Release of "Fugaku-LLM" - a large language model trained on the supercomputer "Fugaku"". Fujitsu. 10 May 2024. Retrieved 20 April 2026.

[100] "Fugaku-LLM Terms of Use". 23 April 2024. Retrieved 6 August 2025 – via Hugging Face.

[101] "Fugaku-LLM/Fugaku-LLM-13B · Hugging Face". huggingface.co. Archived from the original on 2024-05-17. Retrieved 2024-05-17.

[102] Dickson, Ben (22 May 2024). "Meta introduces Chameleon, a state-of-the-art multimodal model". VentureBeat.

[103] "chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon". Meta Research. Retrieved 6 August 2025 – via GitHub.

[104] AI, Mistral (2024-04-17). "Cheaper, Better, Faster, Stronger". mistral.ai. Archived from the original on 2024-05-05. Retrieved 2024-05-05.

[phi-3-blog-105] Bilenko, Misha (23 April 2024). "Introducing Phi-3: Redefining what's possible with SLMs". azure.microsoft.com. Archived from the original on 8 May 2026. Retrieved 8 May 2026.

[106] Abdin, Marah; et al. (2024). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". arXiv:2404.14219 [cs.CL].

[107] "Qwen2". GitHub. Archived from the original on 2024-06-17. Retrieved 2024-06-17.

[108] DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong (2024-06-19), DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model, arXiv:2405.04434

[109] "NVIDIA Open Models License". Nvidia. 16 June 2025. Retrieved 6 August 2025.

[110] "Trustworthy AI". Nvidia. 27 June 2024. Retrieved 6 August 2025.

[111] "nvidia/Nemotron-4-340B-Base · Hugging Face". huggingface.co. 2024-06-14. Archived from the original on 2024-06-15. Retrieved 2024-06-15.

[112] "Nemotron-4 340B | Research". research.nvidia.com. Archived from the original on 2024-06-15. Retrieved 2024-06-15.

[113] "Introducing Claude 3.5 Sonnet". www.anthropic.com. Retrieved 8 August 2025.

[114] "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku". www.anthropic.com. Retrieved 8 August 2025.

[115] "The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta

[116] "llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models". GitHub. Archived from the original on 2024-07-23. Retrieved 2024-07-23.

[117] "LICENSE · xai-org/grok-2 at main". 5 November 2025. Retrieved 18 November 2025 – via Hugging Face.

[118] "xAI Acceptable Use Policy". xAI. 2 January 2025. Retrieved 18 November 2025.

[119] Weatherbed, Jess (14 August 2024). "xAI's new Grok-2 chatbots bring AI image generation to X". The Verge. Retrieved 18 November 2025.

[120] Ha, Anthony (24 August 2025). "Elon Musk says xAI has open sourced Grok 2.5". TechCrunch. Retrieved 18 November 2025.

[121] "Introducing OpenAI o1". openai.com. Retrieved 8 August 2025.

[122] Paul, Katie; Tong, Anna (13 September 2024). "OpenAI launches new series of AI models with 'reasoning' abilities". Reuters.

[123] Jindal, Siddharth (24 October 2024). "Sarvam AI Launches Sarvam-1, Outperforms Gemma-2 and Llama-3.2". Analytics India Magazine. Archived from the original on 25 July 2025. Retrieved 20 April 2026.

[124] "LICENSE.md · sarvamai/sarvam-1". 23 October 2024. Retrieved 20 April 2026 – via Hugging Face.

[Mistral_models_overview-125] "Models Overview". mistral.ai. Retrieved 2025-03-03.

[126] "OLMo 2: The best fully open language model to date | Ai2". allenai.org. Retrieved 2026-03-17.

[:7-127] OLMo, Team; Walsh, Pete; Soldaini, Luca; Groeneveld, Dirk; Lo, Kyle; Arora, Shane; Bhagia, Akshita; Gu, Yuling; Huang, Shengyi (2025-10-08), 2 OLMo 2 Furious, arXiv, doi:10.48550/arXiv.2501.00656, arXiv:2501.00656, retrieved 2026-03-17

[128] "Phi-4 Model Card". huggingface.co. Retrieved 2025-11-11.{{cite web}}: CS1 maint: url-status (link)

[129] "Introducing Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning". techcommunity.microsoft.com. Retrieved 2025-11-11.{{cite web}}: CS1 maint: url-status (link)

[130] deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, retrieved 2024-12-26

[131] Feng, Coco (25 March 2025). "DeepSeek wows coders with more powerful open-source V3 model". South China Morning Post. Retrieved 6 April 2025.

[132] Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, retrieved 2024-12-27

[133] deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, retrieved 2025-01-21

[134] DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong (2025-01-22), DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, arXiv:2501.12948

[135] Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng (2025-01-03), Qwen2.5 Technical Report, arXiv:2412.15115

[:0-136] MiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao (2025-01-14), MiniMax-01: Scaling Foundation Models with Lightning Attention, arXiv:2501.08313

[137] MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, retrieved 2025-01-26

[138] Kavukcuoglu, Koray (5 February 2025). "Gemini 2.0 is now available to everyone". Google. Retrieved 6 February 2025.

[139] "Gemini 2.0: Flash, Flash-Lite and Pro". Google for Developers. Retrieved 6 February 2025.

[140] Franzen, Carl (5 February 2025). "Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search". VentureBeat. Retrieved 6 February 2025.

[141] "Grok 3 Beta — The Age of Reasoning Agents". x.ai. Retrieved 2025-02-22.

[142] "Claude 3.7 Sonnet and Claude Code". www.anthropic.com. Retrieved 8 August 2025.

[143] "Introducing GPT-4.5". openai.com. Retrieved 8 August 2025.

[144] Kavukcuoglu, Koray (25 March 2025). "Gemini 2.5: Our most intelligent AI model". Google. Retrieved 23 September 2025.

[145] "meta-llama/Llama-4-Maverick-17B-128E · Hugging Face". huggingface.co. 2025-04-05. Retrieved 2025-04-06.

[146] "The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation". ai.meta.com. Archived from the original on 2025-04-05. Retrieved 2025-04-05.

[147] "Introducing OpenAI o3 and o4-mini". openai.com. Retrieved 8 August 2025.

[148] Team, Qwen (2025-04-29). "Qwen3: Think Deeper, Act Faster". Qwen. Retrieved 2025-04-29.

[149] "Introducing Claude 4". www.anthropic.com. Retrieved 8 August 2025.

[150] Yadav, Nandini (2025-05-26). "Indian AI startup launches Sarvam-M model: What is it, why is everyone talking about it". India Today. Retrieved 2026-03-18.

[151] "Sarvam-M: Open Source Hybrid Indic LLM | Sarvam AI". Sarvam AI. 2025-05-23. Retrieved 2026-03-18.

[152] "Grok 4". x.ai. 9 July 2025. Archived from the original on 9 May 2026. Retrieved 9 May 2026.

[:12-153] Pundalik, Kundeshwar; Sawarkar, Piyush; Sahoo, Nihar; Shinde, Abhishek; Chanda, Prateek; Goswami, Vedant; Nagpal, Ajay; Singh, Atul; Thakur, Viraj (2025-07-16), PARAM-1 BharatGen 2.9B Model, arXiv, doi:10.48550/arXiv.2507.13390, arXiv:2507.13390, retrieved 2026-03-18

[155] "README.md · bharatgenai/Param-1". 24 February 2026. Retrieved 12 April 2026 – via Hugging Face.

[156] "GLM-4.5: Reasoning, Coding, and Agentic Abililties". z.ai. Retrieved 2025-08-06.

[158] "zai-org/GLM-4.5 · Hugging Face". huggingface.co. 2025-08-04. Retrieved 2025-08-06.

[159] Whitwam, Ryan (5 August 2025). "OpenAI announces two "gpt-oss" open AI models, and you can download them today". Ars Technica. Retrieved 6 August 2025.

[160] "Claude Opus 4.1". www.anthropic.com. Retrieved 8 August 2025.

[161] "Introducing GPT-5". openai.com. 7 August 2025. Retrieved 8 August 2025.

[162] "OpenAI Platform: GPT-5 Model Documentation". openai.com. Retrieved 18 August 2025.

[163] "deepseek-ai/DeepSeek-V3.1 · Hugging Face". huggingface.co. 2025-08-21. Retrieved 2025-08-25.

[164] "DeepSeek-V3.1 Release | DeepSeek API Docs". api-docs.deepseek.com. Retrieved 2025-08-25.

[165] "Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in German). Zürich: ETH Zürich. 2025-09-02. Retrieved 2025-11-07.

[166] Kirchner, Malte (2025-09-02). "Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor". heise online (in German). Retrieved 2025-11-07.

[167] "Introducing Claude Sonnet 4.5". www.anthropic.com. Retrieved 29 September 2025.

[168] "GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities". z.ai. Retrieved 2025-10-01.

[169] "zai-org/GLM-4.6 · Hugging Face". huggingface.co. 2025-09-30. Retrieved 2025-10-01.

[170] "GLM-4.6". modelscope.cn. Retrieved 2025-10-01.

[171] "A new era of intelligence with Gemini 3". Google. 18 November 2025. Retrieved 5 January 2026.

[172] "Olmo 3: Charting a path through the model flow to lead open-source AI". Ai2. 20 November 2025.

[:8-173] Olmo, Team; Ettinger, Allyson; Bertsch, Amanda; Kuehl, Bailey; Graham, David; Heineman, David; Groeneveld, Dirk; Brahman, Faeze; Timbers, Finbarr (2025-12-15), Olmo 3, arXiv, doi:10.48550/arXiv.2512.13961, arXiv:2512.13961, retrieved 2026-03-17

[174] "Introducing Claude Opus 4.5". www.anthropic.com. Retrieved 8 January 2026.

[Binder-2025-175] Binder, Matt (3 December 2025). "DeepSeek v3.2: What it is, how it compares to ChatGPT, how to try it". Mashable. Retrieved 12 April 2026.

[DeepSeek-V3.2_release-176] "DeepSeek-V3.2 Release". DeepSeek API Docs. 1 December 2025. Retrieved 12 April 2026.

[177] "DeepSeek-V3.2: Efficient Reasoning & Agentic AI". Hugging Face. 1 December 2025. Retrieved 12 April 2026.

[178] "Advancing science and math with GPT-5.2". openai.com. Retrieved 4 January 2026.

[179] "Pushing Qwen3-Max-Thinking Beyond its Limits". Qwen. 25 January 2026. Archived from the original on 6 February 2026. Retrieved 6 February 2026. We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities [...]; and (2) advanced test-time scaling techniques [...]. [...] We limit [parallel trajectories] and redirect saved computation to iterative self-reflection guided by a "take-experience" mechanism.

[180] Team, Kimi; Bai, Yifan; Bao, Yiping; Charles, Y.; Chen, Cheng; Chen, Guanduo; Chen, Haiting; Chen, Huarong; Chen, Jiahao (2026-02-03), Kimi K2: Open Agentic Intelligence, arXiv, doi:10.48550/arXiv.2507.20534, arXiv:2507.20534, retrieved 2026-03-18

[181] Team, Kimi; Bai, Tongtong; Bai, Yifan; Bao, Yiping; Cai, S. H.; Cao, Yuan; Charles, Y.; Che, H. S.; Chen, Cheng (2026-02-02), Kimi K2.5: Visual Agentic Intelligence, arXiv, doi:10.48550/arXiv.2602.02276, arXiv:2602.02276, retrieved 2026-03-18

[182] "Kimi K2.5: Chat with Kimi K2.5 for Free". Kimi K2.5. Retrieved 2026-03-18.

[183] Jiang, Ben (3 February 2026). "Compact AI model from China's StepFun outshines rivals from DeepSeek, Moonshot". South China Morning Post. Archived from the original on 4 February 2026. Retrieved 14 April 2026.

[184] "Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act". StepFun. 12 February 2026. Retrieved 20 April 2026.

[185] "stepfun-ai/Step-3.5-Flash". 14 March 2026. Retrieved 14 April 2026 – via Hugging Face.

[186] "LICENSE · bharatgenai/Param2-17B-A2.4B-Thinking". 16 February 2026. Retrieved 12 April 2026 – via Hugging Face.

[187] "bharatgenai/Param2-17B-A2.4B-Thinking". Retrieved 2026-03-08 – via Hugging Face.

[188] "sarvamai/sarvam-1-v0.5 · Hugging Face". huggingface.co. Retrieved 2026-03-08.

[sarvam-30b-105b-blog-190] "Open-Sourcing Sarvam 30B and 105B". Sarvam AI. 6 March 2026. Archived from the original on 8 May 2026. Retrieved 8 May 2026.

[191] "sarvamai/sarvam-105b · Hugging Face". huggingface.co. Retrieved 2026-03-08.

[192] Kumar, Abhijeet (19 February 2026). "Why Sarvam's new 105B model marks a shift in India's sovereign AI ambitions". Business Standard.

[193] Singh, Jagmeet (2026-02-18). "Indian AI lab Sarvam's new models are a major bet on the viability of open source AI". TechCrunch. Retrieved 2026-03-18.

[194] Marquez, Javier (17 March 2026). "Una IA para reunir todas las funciones posibles: la apuesta de Mistral con Small 4 es hacer más con menos cosas" [An AI to bring together all possible functions: Mistral's bet with Small 4 is to do more with less]. Xataka (in Spanish). Retrieved 20 April 2026.

[195] "Introducing Mistral Small 4". Mistral AI. Retrieved 20 April 2026.

[196] "Xiaomi Launches Powerful AI Model MiMo-V2 Pro With 1 Trillion Parametres, 1 Million Token Context Window". NDTV Profit. 19 March 2026.

[197] "Mystery AI model revealed to be Xiaomi's following suspicions it was DeepSeek's". Reuters. 18 March 2026. Retrieved 3 April 2026.

[198] Whitwam, Ryan (2 April 2026). "Google announces Gemma 4 open AI models, switches to Apache 2.0 license". Ars Technica. Retrieved 3 April 2026.

[199] Mann, Tobias (2 April 2026). "Google battles Chinese open weights models with Gemma 4". Retrieved 3 April 2026.

[200] Franzen, Carl (7 April 2026). "AI joins the 8-hour work day as GLM ships 5.1 open source LLM, beating Opus 4.6 and GPT-5.4 on SWE-Bench Pro". VentureBeat. Retrieved 12 April 2026.

[201] "GLM-5.1: Towards Long-Horizon Tasks". Z.ai. Retrieved 12 April 2026.

[202] "Introducing Muse Spark: Scaling Towards Personal Superintelligence". ai.meta.com. 8 April 2026. Archived from the original on 9 May 2026. Retrieved 9 May 2026.

[203] "A Chinese AI called 'Qwen3.6-35B-A3B,' which is more powerful than Gemma4, has been released as an open model". Gigazine [ja]. 17 April 2026. Retrieved 17 April 2026.

[204] "README.md · Qwen/Qwen3.6-35B-A3B". 15 April 2026. Retrieved 17 April 2026 – via Hugging Face.

[Butts-2026-205] Butts, Dylan (24 April 2026). "China's DeepSeek releases preview of long-awaited V4 model as AI race intensifies". CNBC.

[206] "MiMo-V2.5-Pro | Xiaomi". mimo.xiaomi.com. Retrieved 2026-05-03.

[207] Thomas, Prasanth Aby (28 April 2026). "Xiaomi releases MIT‑licensed MiMo models for long‑running AI agents". Computerworld. Retrieved 6 May 2026.

[208] "XiaomiMiMo/MiMo-V2.5". Hugging Face. XiaomiMiMo. Retrieved 3 May 2026.

[209] "Gemini 3.5: frontier intelligence with action". Google. 19 May 2026.

[210] "Introducing Claude Opus 4.8". Anthropic. 28 May 2026. Archived from the original on 30 May 2026. Retrieved 30 May 2026.

[212] "Step 3.7 Flash". StepFun. 29 May 2026. Archived from the original on 30 May 2026. Retrieved 30 May 2026.

[a]

[b]

[c]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[d]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[e]

[f]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

List of large language models 📖 Wikipedia

Contents

List

2018

2019

2020

2021

2022

2023

2024

2025

2026

See also

Notes

References