Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

List of large language models

From Wikipedia, the free encyclopedia

This is adynamic list and may never be able to satisfy particular standards for completeness. You can help byediting the page to add missing items, with references toreliable sources.
icon
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "List of large language models" – news ·newspapers ·books ·scholar ·JSTOR
(February 2026) (Learn how and when to remove this message)

Alarge language model (LLM) is a type ofmachine learningmodel designed fornatural language processing tasks such as languagegeneration. LLMs arelanguage models with many parameters, and are trained withself-supervised learning on a vast amount of text.

List

[edit]

For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written.

NameRelease date[a]DeveloperNumber of parameters (billion)[b]Corpus sizeTraining cost (petaFLOP-day)License[c]Notes
GPT-1June 11, 2018OpenAI0.117Unknown1[1]MIT[2]First GPT model, decoder-only transformer. Trained for 30 days on 8 P600GPUs.[3]
BERTOctober 2018Google0.340[4]3.3 billion words[4]9[5]Apache 2.0[6]An early and influential language model.[7]Encoder-only and thus not built to be prompted or generative.[8] Training took 4 days on 64 TPUv2 chips.[9]
T5October 2019Google11[10]34 billion tokens[10]Apache 2.0[11]Base model for many Google projects, such as Imagen.[12]
XLNetJune 2019Google0.340[13]33 billion words330Apache 2.0[14]An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.[15]
GPT-2February 2019OpenAI1.5[16]40GB[17] (~10 billion tokens)[18]28[19]MIT[20]Trained on 32 TPUv3 chips for 1 week.[19]
GPT-3May 2020OpenAI175[21]300 billion tokens[18]3640[22]ProprietaryA fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface calledChatGPT in 2022.[23]
GPT-NeoMarch 2021EleutherAI2.7[24]825 GiB[25]UnknownMIT[26]The first ofa series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.[26]
GPT-JJune 2021EleutherAI6[27]825 GiB[25]200[28]Apache 2.0GPT-3-style language model
Megatron-Turing NLGOctober 2021[29]Microsoft andNvidia530[30]338.6 billion tokens[30]38000[31]UnreleasedTrained for 3 months on over 2000 A100 GPUs on the NVIDIASelene Supercomputer, for over 3 million GPU-hours[31]
Ernie 3.0 TitanDecember 2021Baidu260[32]4TBUnknownProprietaryChinese-language LLM.Ernie Bot is based on this model.
Claude[33]December 2021Anthropic52[34]400 billion tokens[34]UnknownProprietaryFine-tuned for desirable behavior in conversations.[35]
GLaM (Generalist Language Model)December 2021Google1200[36]1.6 trillion tokens[36]5600[36]ProprietarySparsemixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
GopherDecember 2021DeepMind280[37]300 billion tokens[38]5833[39]ProprietaryLater developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications)January 2022Google137[40]1.56T words,[40]168 billion tokens[38]4110[41]ProprietarySpecialized for response generation in conversations.
GPT-NeoXFebruary 2022EleutherAI20[42]825 GiB[25]740[28]Apache 2.0based on the Megatron architecture
ChinchillaMarch 2022DeepMind70[43]1.4 trillion tokens[43][38]6805[39]ProprietaryReduced-parameter model trained on more data. Used in theSparrow bot. Often cited for itsneural scaling law.
PaLM (Pathways Language Model)April 2022Google540[44]768 billion tokens[43]29,250[39]ProprietaryTrained for ~60 days on ~6000TPU v4 chips.[39]
OPT (Open Pretrained Transformer)May 2022Meta175[45]180 billion tokens[46]310[28]Non-commercial research[d]GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.[47]
YaLM 100BJune 2022Yandex100[48]1.7TB[48]UnknownApache 2.0English-Russian model based on Microsoft's Megatron-LM
MinervaJune 2022Google540[49]38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server[49]UnknownProprietaryFor solving "mathematical and scientific questions using step-by-step reasoning".[50] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOMJuly 2022Large collaboration led byHugging Face175[51]350 billion tokens (1.6TB)[52]UnknownResponsible AIEssentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
GalacticaNovember 2022Meta120106 billion tokens[53]UnknownCC-BY-NC-4.0Trained on scientific text and modalities.
AlexaTM (Teacher Models)November 2022Amazon20[54]1.3 trillion[55]UnknownProprietary[56]Bidirectional sequence-to-sequence architecture
LlamaFebruary 2023Meta AI65[57]1.4 trillion[57]6300[58]Non-commercial research[e]Corpus has 20 languages. "Overtrained" (compared toChinchilla scaling law) for better performance with fewer parameters.[57]
GPT-4March 2023OpenAIUnknown[f]
(According to rumors: 1760)[60]
UnknownUnknown,
estimated 230,000
ProprietaryAvailable for all ChatGPT users now and used inseveral products.
Cerebras-GPTMarch 2023Cerebras13[61]270[28]Apache 2.0Trained withChinchilla formula.
FalconMarch 2023Technology Innovation Institute40[62]1 trillion tokens, from RefinedWeb (filtered web text corpus)[63] plus some "curated corpora".[64]2800[58]Apache 2.0[65]
BloombergGPTMarch 2023Bloomberg L.P.50363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets[66]UnknownUnreleasedTrained on financial data from proprietary sources, for financial tasks
PanGu-ΣMarch 2023Huawei1085329 billion tokens[67]UnknownProprietary
OpenAssistant[68]March 2023LAION171.5 trillion tokensUnknownApache 2.0Trained on crowdsourced open data
Jurassic-2[69]March 2023AI21 LabsUnknownUnknownUnknownProprietaryMultilingual[70]
PaLM 2 (Pathways Language Model 2)May 2023Google340[71]3.6 trillion tokens[71]85,000[58]ProprietaryWas used inBard chatbot.[72]
YandexGPTMay 17, 2023YandexUnknownUnknownUnknownProprietaryUsed inAlice chatbot.
Llama 2July 2023Meta AI70[73]2 trillion tokens[73]21,000Llama 2 license1.7 million A100-hours.[74]
Claude 2July 2023AnthropicUnknownUnknownUnknownProprietaryUsed in Claude chatbot.[75]
Granite 13bJuly 2023IBMUnknownUnknownUnknownProprietaryUsed inIBM Watsonx.[76]
Mistral 7BSeptember 2023Mistral AI7.3[77]UnknownUnknownApache 2.0
YandexGPT 2September 7, 2023YandexUnknownUnknownUnknownProprietaryUsed inAlice chatbot.
Claude 2.1November 2023AnthropicUnknownUnknownUnknownProprietaryUsed in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.[78]
Grok 1[79]November 2023xAI314UnknownUnknownApache 2.0Used inGrok chatbot. Grok 1 has a context length of 8,192 tokens and has access to X (Twitter).[80]
Gemini 1.0December 2023Google DeepMindUnknownUnknownUnknownProprietaryMultimodal model, comes in three sizes. Used inthe chatbot of the same name.[81]
Mixtral 8x7BDecember 2023Mistral AI46.7UnknownUnknownApache 2.0Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.[82]Mixture of experts model, with 12.9 billion parameters activated per token.[83]
DeepSeek-LLMNovember 29, 2023DeepSeek672T tokens[84]: table 2 12,000DeepSeek LicenseTrained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B[84]: figure 5 
Phi-2December 2023Microsoft2.71.4T tokens419[85]MITTrained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.[85]
Gemini 1.5February 2024Google DeepMindUnknownUnknownUnknownProprietaryMultimodal model, based on aMixture-of-Experts (MoE) architecture. Context window above 1 million tokens.[86]
Gemini UltraFebruary 2024Google DeepMindUnknownUnknownUnknownProprietary
GemmaFebruary 2024Google DeepMind76T tokensUnknownGemma Terms of Use[87]
Claude 3March 2024AnthropicUnknownUnknownUnknownProprietaryIncludes three models, Haiku, Sonnet, and Opus.[88]
DBRXMarch 2024Databricks andMosaic ML13612T tokensUnknownDatabricks Open Model License[89][90]Training cost 10 million USD
YandexGPT 3 ProMarch 28, 2024YandexUnknownUnknownUnknownProprietaryUsed inAlice chatbot.
Fugaku-LLMMay 2024Fujitsu,Tokyo Institute of Technology, etc.13380B tokensUnknownFugaku-LLM Terms of Use[91]The largest model ever trained on CPU-only, on theFugaku[92]
ChameleonMay 2024Meta AI34[93]4.4 trillionUnknownNon-commercial research[94]
Mixtral 8x22BApril 17, 2024Mistral AI141UnknownUnknownApache 2.0[95]
Phi-3April 23, 2024Microsoft14[96]4.8T tokensUnknownMITMicrosoft markets them as "small language model".[97]
Granite Code ModelsMay 2024IBMUnknownUnknownUnknownApache 2.0
YandexGPT 3 LiteMay 28, 2024YandexUnknownUnknownUnknownProprietaryUsed inAlice chatbot.
Qwen2June 2024Alibaba Cloud72[98]3T tokensUnknownQwen LicenseMultiple sizes, the smallest being 0.5B.
DeepSeek-V2June 2024DeepSeek2368.1T tokens28,000DeepSeek License1.4M hours on H800.[99]
Nemotron-4June 2024Nvidia3409T tokens200,000NVIDIA Open Model License[100][101]Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.[102][103]
Claude 3.5June 2024AnthropicUnknownUnknownUnknownProprietaryInitially, only one model, Sonnet, was released.[104] In October 2024, Sonnet 3.5 was upgraded, and Haiku 3.5 became available.[105]
Llama 3.1July 2024Meta AI40515.6T tokens440,000Llama 3 license405B version took 31 million hours onH100-80GB, at 3.8E25 FLOPs.[106][107]
Grok-2August 14, 2024xAIUnknownUnknownUnknownxAI Community License Agreement[108][109]Originally closed-source, then re-released as "Grok 2.5" under a source-available license in August 2025.[110][111]
OpenAI o1September 12, 2024OpenAIUnknownUnknownUnknownProprietaryReasoning model.[112]
YandexGPT 4 Lite andProOctober 24, 2024YandexUnknownUnknownUnknownProprietaryUsed inAlice chatbot.
Sarvam 1October 24, 2024Sarvam AI22T tokensUnknownUnknownMultilingual LLM optimized for 10+ Indic languages and English; aims efficient inference; built on Indian infrastructure.[citation needed]
Mistral LargeNovember 2024Mistral AI123UnknownUnknownMistral Research LicenseUpgraded over time. The latest version is 24.11.[113]
PixtralNovember 2024Mistral AI123UnknownUnknownMistral Research LicenseMultimodal. There is also a 12B version which is under Apache 2 license.[113]
Phi-4December 12, 2024Microsoft14[114]9.8T tokensUnknownMITMicrosoft markets them as "small language model".[115]
DeepSeek-V3December 2024DeepSeek67114.8T tokens56,000MIT2.788M hours on H800 GPUs.[116] Originally released under the DeepSeek License, then re-released under the MIT License as "DeepSeek-V3-0324" in March 2025.[117]
Amazon NovaDecember 2024AmazonUnknownUnknownUnknownProprietaryIncludes three models, Nova Micro, Nova Lite, and Nova Pro[118]
DeepSeek-R1January 2025DeepSeek671Not applicableUnknownMITNo pretraining. Reinforcement-learned upon V3-Base.[119][120]
Qwen2.5January 2025Alibaba7218T tokensUnknownQwen License7 dense models, with parameter count from 0.5B to 72B. They also released 2 MoE variants.[121]
MiniMax-Text-01January 2025Minimax4564.7T tokens[122]UnknownMinimax Model license[123][122]
Gemini 2.0February 2025Google DeepMindUnknownUnknownUnknownProprietaryThree models released: Flash, Flash-Lite and Pro[124][125][126]
Claude 3.7February 24, 2025AnthropicUnknownUnknownUnknownProprietaryOne model, Sonnet 3.7.[127]
YandexGPT 5 Lite Pretrain andProFebruary 25, 2025YandexUnknownUnknownUnknownProprietaryUsed inAlice Neural Network chatbot.
GPT-4.5February 27, 2025OpenAIUnknownUnknownUnknownProprietaryLargest non-reasoning model.[128]
Grok 3February 2025xAIUnknownUnknownUnknownProprietaryTraining cost claimed "10x the compute of previous state-of-the-art models".[129]
Gemini 2.5March 25, 2025Google DeepMindUnknownUnknownUnknownProprietaryThree models released: Flash, Flash-Lite and Pro[130]
YandexGPT 5 Lite InstructMarch 31, 2025YandexUnknownUnknownUnknownProprietaryUsed inAlice Neural Network chatbot.
Llama 4April 5, 2025Meta AI40040T tokensUnknownLlama 4 license[131][132]
OpenAI o3 and o4-miniApril 16, 2025OpenAIUnknownUnknownUnknownProprietaryReasoning models.[133]
Qwen3April 2025Alibaba Cloud23536T tokensUnknownApache 2.0Multiple sizes, the smallest being 0.6B.[134]
Claude 4May 22, 2025AnthropicUnknownUnknownUnknownProprietaryIncludes two models, Sonnet and Opus.[135]
Sarvam-MMay 23, 2025Sarvam AI24UnknownUnknownUnknownHybrid reasoning model fine-tuned on Mistral Small base; optimized for math, programming, and Indian languages.[citation needed]
Grok 4July 9, 2025xAIUnknownUnknownUnknownProprietary
GLM-4.5July 29, 2025Zhipu AI35522T tokensUnknownMITReleased in 335B and 106B sizes.[136] Corpus size was calculated by combining the 15 trillion tokens and the 7 trillion tokens pre-training mix.[137]
GPT-OSSAugust 5, 2025OpenAI117UnknownUnknownApache 2.0Released in 20B and 120B sizes.[138]
Claude 4.1August 5, 2025AnthropicUnknownUnknownUnknownProprietaryIncludes one model, Opus.[139]
GPT-5August 7, 2025OpenAIUnknownUnknownUnknownProprietaryIncludes three models, GPT-5, GPT-5 mini, and GPT-5 nano. GPT-5 is available in ChatGPT and API. It includes thinking abilities.[140][141]
DeepSeek-V3.1August 21, 2025DeepSeek67115.639TMITTraining size: 14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)[142]It is a hybrid model that can switch between thinking and non-thinking modes.[143]
YandexGPT 5.1 ProAugust 28, 2025YandexUnknownUnknownUnknownProprietaryUsed inAlice Neural Network chatbot.
ApertusSeptember 2, 2025ETH Zurich andEPF Lausanne7015 trillion[144]UnknownApache 2.0It's said to be the first LLM to be compliant withEU'sArtificial Intelligence Act.[145]
Claude Sonnet 4.5September 29, 2025AnthropicUnknownUnknownUnknownProprietary[146]
DeepSeek-V3.2-ExpSeptember 29, 2025DeepSeek685MITThis experimental model built upon v3.1-Terminus uses a custom efficient mechanism tagged DeepSeek Sparse Attention (DSA).[147][148][149]
GLM-4.6September 30, 2025Zhipu AI357Apache 2.0[150][151][152]
Alice AI LLM 1.0October 28, 2025YandexUnknownUnknownUnknownProprietaryAvailable inAlice AI chatbot.
Gemini 3November 18, 2025Google DeepMindUnknownUnknownUnknownProprietaryTwo models released: Deep Think and Pro[153]
Claude Opus 4.5November 24, 2025AnthropicUnknownUnknownUnknownProprietaryThe largest model in the Claude family.[154]
GPT 5.2December 11, 2025OpenAIUnknownUnknownUnknownProprietaryIt was able to solve an open problem in statistical learning theory that had previously remained unresolved by human researchers.[155]
GLM-4.7December 22, 2025Zhipu AI355Apache 2.0MoE architecture. Open-source SOTA on coding benchmarks. Also released Flash variant (30B-A3B) on January 19, 2026.
Qwen3-Max-ThinkingJanuary 26, 2026Alibaba CloudUnknownUnknownUnknownProprietaryProprietary reasoning model with adaptive tool-use, test-time scaling, and iterative self-reflection.[156]
Kimi K2.5January 27, 2026Moonshot AI100015T tokensModified MIT LicenseMoE with 32B active parameters per token. Agent Swarm technology coordinating up to 100 parallel sub-agents. Native multimodal.[citation needed]
Claude Opus 4.6February 5, 2026AnthropicUnknownUnknownUnknownProprietary
GPT-5.3-CodexFebruary 5, 2026OpenAIUnknownUnknownUnknownProprietary
Sarvam-2BJanuary 2026Sarvam AI2UnknownUnknownUnknownAudio-first LLM supporting 22 Indian languages (speech focus).[citation needed]
Sovereign LLMJanuary 2026Sarvam AI70UnknownUnknownUnknownA foundational model sponsored by IndiaAI Mission; multiple variants planned (Large, Small, Edge).[citation needed]

See also

[edit]

Notes

[edit]
  1. ^This is the date that documentation describing the model's architecture was first released.
  2. ^In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
  3. ^This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated. LLMs may be licensed differently from the chatbots that use them; for the licenses of chatbots, seeList of chatbots.
  4. ^The smaller models including 66B are publicly available, while the 175B model is available on request.
  5. ^Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
  6. ^As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."[59]

References

[edit]
  1. ^"Improving language understanding with unsupervised learning".openai.com. June 11, 2018.Archived from the original on 2023-03-18. Retrieved2023-03-18.
  2. ^"finetune-transformer-lm".GitHub.Archived from the original on 19 May 2023. Retrieved2 January 2024.
  3. ^Radford, Alec (11 June 2018)."Improving language understanding with unsupervised learning".OpenAI. Retrieved18 November 2025.
  4. ^abDevlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".arXiv:1810.04805v2 [cs.CL].
  5. ^Prickett, Nicole Hemsoth (2021-08-24)."Cerebras Shifts Architecture To Meet Massive AI/ML Models".The Next Platform.Archived from the original on 2023-06-20. Retrieved2023-06-20.
  6. ^"BERT". March 13, 2023.Archived from the original on January 13, 2021. RetrievedMarch 13, 2023 – via GitHub.
  7. ^Manning, Christopher D. (2022)."Human Language Understanding & Reasoning".Daedalus.151 (2):127–138.doi:10.1162/daed_a_01905.S2CID 248377870.Archived from the original on 2023-11-17. Retrieved2023-03-09.
  8. ^Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris (2022). "Bidirectional Language Models Are Also Few-shot Learners".arXiv:2209.14500 [cs.LG].
  9. ^Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".arXiv:1810.04805v2 [cs.CL].
  10. ^abRaffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020)."Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer".Journal of Machine Learning Research.21 (140):1–67.arXiv:1910.10683.ISSN 1533-7928.
  11. ^google-research/text-to-text-transfer-transformer, Google Research, 2024-04-02,archived from the original on 2024-03-29, retrieved2024-04-04
  12. ^"Imagen: Text-to-Image Diffusion Models".imagen.research.google.Archived from the original on 2024-03-27. Retrieved2024-04-04.
  13. ^"Pretrained models — transformers 2.0.0 documentation".huggingface.co.Archived from the original on 2024-08-05. Retrieved2024-08-05.
  14. ^"xlnet".GitHub.Archived from the original on 2 January 2024. Retrieved2 January 2024.
  15. ^Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding".arXiv:1906.08237 [cs.CL].
  16. ^"GPT-2: 1.5B Release".OpenAI. 2019-11-05.Archived from the original on 2019-11-14. Retrieved2019-11-14.
  17. ^"Better language models and their implications".openai.com.Archived from the original on 2023-03-16. Retrieved2023-03-13.
  18. ^ab"OpenAI's GPT-3 Language Model: A Technical Overview".lambdalabs.com. 3 June 2020.Archived from the original on 27 March 2023. Retrieved13 March 2023.
  19. ^ab"openai-community/gpt2-xl · Hugging Face".huggingface.co.Archived from the original on 2024-07-24. Retrieved2024-07-24.
  20. ^"gpt-2".GitHub.Archived from the original on 11 March 2023. Retrieved13 March 2023.
  21. ^Wiggers, Kyle (28 April 2022)."The emerging types of language models and why they matter".TechCrunch.Archived from the original on 16 March 2023. Retrieved9 March 2023.
  22. ^Table D.1 inBrown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey; Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners".arXiv:2005.14165v4 [cs.CL].
  23. ^"ChatGPT: Optimizing Language Models for Dialogue".OpenAI. 2022-11-30.Archived from the original on 2022-11-30. Retrieved2023-01-13.
  24. ^"GPT Neo". March 15, 2023.Archived from the original on March 12, 2023. RetrievedMarch 12, 2023 – via GitHub.
  25. ^abcGao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling".arXiv:2101.00027 [cs.CL].
  26. ^abIyer, Abhishek (15 May 2021)."GPT-3's free alternative GPT-Neo is something to be excited about".VentureBeat.Archived from the original on 9 March 2023. Retrieved13 March 2023.
  27. ^"GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".www.forefront.ai. Archived fromthe original on 2023-03-09. Retrieved2023-02-28.
  28. ^abcdDey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster".arXiv:2304.03208 [cs.LG].
  29. ^Alvi, Ali; Kharya, Paresh (11 October 2021)."Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model".Microsoft Research.Archived from the original on 13 March 2023. Retrieved13 March 2023.
  30. ^abSmith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model".arXiv:2201.11990 [cs.CL].
  31. ^abRajbhandari, Samyam; Li, Conglong; Yao, Zhewei; Zhang, Minjia; Aminabadi, Reza Yazdani; Awan, Ammar Ahmad; Rasley, Jeff; He, Yuxiong (2022-07-21),DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale,arXiv:2201.05596
  32. ^Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu, Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation".arXiv:2112.12731 [cs.CL].
  33. ^"Product".Anthropic.Archived from the original on 16 March 2023. Retrieved14 March 2023.
  34. ^abAskell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment".arXiv:2112.00861 [cs.CL].
  35. ^Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback".arXiv:2212.08073 [cs.CL].
  36. ^abcDai, Andrew M; Du, Nan (December 9, 2021)."More Efficient In-Context Learning with GLaM".ai.googleblog.com.Archived from the original on 2023-03-12. Retrieved2023-03-09.
  37. ^"Language modelling at scale: Gopher, ethical considerations, and retrieval".www.deepmind.com. 8 December 2021.Archived from the original on 20 March 2023. Retrieved20 March 2023.
  38. ^abcHoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models".arXiv:2203.15556 [cs.CL].
  39. ^abcdTable 20 and page 66 ofPaLM: Scaling Language Modeling with PathwaysArchived 2023-06-10 at theWayback Machine
  40. ^abCheng, Heng-Tze; Thoppilan, Romal (January 21, 2022)."LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything".ai.googleblog.com.Archived from the original on 2022-03-25. Retrieved2023-03-09.
  41. ^Thoppilan, Romal; De Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications".arXiv:2201.08239 [cs.CL].
  42. ^Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. (2022-05-01).GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Vol. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136.Archived from the original on 2022-12-10. Retrieved2022-12-19.
  43. ^abcHoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022)."An empirical analysis of compute-optimal large language model training".Deepmind Blog.Archived from the original on 13 April 2022. Retrieved9 March 2023.
  44. ^Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022)."Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance".ai.googleblog.com.Archived from the original on 2022-04-04. Retrieved2023-03-09.
  45. ^Susan Zhang; Mona Diab; Luke Zettlemoyer."Democratizing access to large-scale language models with OPT-175B".ai.facebook.com.Archived from the original on 2023-03-12. Retrieved2023-03-12.
  46. ^Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models".arXiv:2205.01068 [cs.CL].
  47. ^"metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq".GitHub. Retrieved2024-10-18.
  48. ^abKhrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22),YaLM 100B,archived from the original on 2023-06-16, retrieved2023-03-18
  49. ^abLewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language Models".arXiv:2206.14858 [cs.CL].
  50. ^"Minerva: Solving Quantitative Reasoning Problems with Language Models".ai.googleblog.com. 30 June 2022. Retrieved20 March 2023.
  51. ^Ananthaswamy, Anil (8 March 2023)."In AI, is bigger always better?".Nature.615 (7951):202–205.Bibcode:2023Natur.615..202A.doi:10.1038/d41586-023-00641-w.PMID 36890378.S2CID 257380916.Archived from the original on 16 March 2023. Retrieved9 March 2023.
  52. ^"bigscience/bloom · Hugging Face".huggingface.co.Archived from the original on 2023-04-12. Retrieved2023-03-13.
  53. ^Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science".arXiv:2211.09085 [cs.CL].
  54. ^"20B-parameter Alexa model sets new marks in few-shot learning".Amazon Science. 2 August 2022.Archived from the original on 15 March 2023. Retrieved12 March 2023.
  55. ^Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model".arXiv:2208.01448 [cs.CL].
  56. ^"AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog".aws.amazon.com. 17 November 2022.Archived from the original on 13 March 2023. Retrieved13 March 2023.
  57. ^abc"Introducing LLaMA: A foundational, 65-billion-parameter large language model".Meta AI. 24 February 2023.Archived from the original on 3 March 2023. Retrieved9 March 2023.
  58. ^abc"The Falcon has landed in the Hugging Face ecosystem".huggingface.co.Archived from the original on 2023-06-20. Retrieved2023-06-20.
  59. ^"GPT-4 Technical Report"(PDF).OpenAI. 2023.Archived(PDF) from the original on March 14, 2023. RetrievedMarch 14, 2023.
  60. ^Schreiner, Maximilian (2023-07-11)."GPT-4 architecture, datasets, costs and more leaked".THE DECODER.Archived from the original on 2023-07-12. Retrieved2024-07-26.
  61. ^Dey, Nolan (March 28, 2023)."Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models".Cerebras.Archived from the original on March 28, 2023. RetrievedMarch 28, 2023.
  62. ^"Abu Dhabi-based TII launches its own version of ChatGPT".tii.ae.Archived from the original on 2023-04-03. Retrieved2023-04-03.
  63. ^Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only".arXiv:2306.01116 [cs.CL].
  64. ^"tiiuae/falcon-40b · Hugging Face".huggingface.co. 2023-06-09. Retrieved2023-06-20.
  65. ^UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-FreeArchived 2024-02-08 at theWayback Machine, 31 May 2023
  66. ^Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance".arXiv:2303.17564 [cs.LG].
  67. ^Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing".arXiv:2303.10845 [cs.CL].
  68. ^Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant Conversations – Democratizing Large Language Model Alignment".arXiv:2304.07327 [cs.CL].
  69. ^Wrobel, Sharon."Tel Aviv startup rolls out new advanced AI language model to rival OpenAI".The Times of Israel.ISSN 0040-7909.Archived from the original on 2023-07-24. Retrieved2023-07-24.
  70. ^Wiggers, Kyle (2023-04-13)."With Bedrock, Amazon enters the generative AI race".TechCrunch.Archived from the original on 2023-07-24. Retrieved2023-07-24.
  71. ^abElias, Jennifer (16 May 2023)."Google's newest A.I. model uses nearly five times more text data for training than its predecessor".CNBC.Archived from the original on 16 May 2023. Retrieved18 May 2023.
  72. ^"Introducing PaLM 2".Google. May 10, 2023.Archived from the original on May 18, 2023. RetrievedMay 18, 2023.
  73. ^ab"Introducing Llama 2: The Next Generation of Our Open Source Large Language Model".Meta AI. 2023.Archived from the original on 2024-01-05. Retrieved2023-07-19.
  74. ^"llama/MODEL_CARD.md at main · meta-llama/llama".GitHub.Archived from the original on 2024-05-28. Retrieved2024-05-28.
  75. ^"Claude 2".anthropic.com.Archived from the original on 15 December 2023. Retrieved12 December 2023.
  76. ^Nirmal, Dinesh (2023-09-07)."Building AI for business: IBM's Granite foundation models".IBM Blog.Archived from the original on 2024-07-22. Retrieved2024-08-11.
  77. ^"Announcing Mistral 7B".Mistral. 2023.Archived from the original on 2024-01-06. Retrieved2023-10-06.
  78. ^"Introducing Claude 2.1".anthropic.com.Archived from the original on 15 December 2023. Retrieved12 December 2023.
  79. ^xai-org/grok-1, xai-org, 2024-03-19,archived from the original on 2024-05-28, retrieved2024-03-19
  80. ^"Grok-1 model card".x.ai. Retrieved12 December 2023.
  81. ^"Gemini – Google DeepMind".deepmind.google.Archived from the original on 8 December 2023. Retrieved12 December 2023.
  82. ^Franzen, Carl (11 December 2023)."Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance".VentureBeat.Archived from the original on 11 December 2023. Retrieved12 December 2023.
  83. ^"Mixtral of experts".mistral.ai. 11 December 2023.Archived from the original on 13 February 2024. Retrieved12 December 2023.
  84. ^abDeepSeek-AI; Bi, Xiao; Chen, Deli; Chen, Guanting; Chen, Shanhuang; Dai, Damai; Deng, Chengqi; Ding, Honghui; Dong, Kai (2024-01-05),DeepSeek LLM: Scaling Open-Source Language Models with Longtermism,arXiv:2401.02954
  85. ^abHughes, Alyssa (12 December 2023)."Phi-2: The surprising power of small language models".Microsoft Research.Archived from the original on 12 December 2023. Retrieved13 December 2023.
  86. ^"Our next-generation model: Gemini 1.5".Google. 15 February 2024.Archived from the original on 16 February 2024. Retrieved16 February 2024.This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we've also successfully tested up to 10 million tokens.
  87. ^"Gemma" – via GitHub.
  88. ^"Introducing the next generation of Claude".www.anthropic.com.Archived from the original on 2024-03-04. Retrieved2024-03-04.
  89. ^"Databricks Open Model License".Databricks. 27 March 2024. Retrieved6 August 2025.
  90. ^"Databricks Open Model Acceptable Use Policy".Databricks. 27 March 2024. Retrieved6 August 2025.
  91. ^"Fugaku-LLM Terms of Use". 23 April 2024. Retrieved6 August 2025 – viaHugging Face.
  92. ^"Fugaku-LLM/Fugaku-LLM-13B · Hugging Face".huggingface.co.Archived from the original on 2024-05-17. Retrieved2024-05-17.
  93. ^Dickson, Ben (22 May 2024)."Meta introduces Chameleon, a state-of-the-art multimodal model".VentureBeat.
  94. ^"chameleon/LICENSE at e3b711ef63b0bb3a129cf0cf0918e36a32f26e2c · facebookresearch/chameleon". Meta Research. Retrieved6 August 2025 – viaGitHub.
  95. ^AI, Mistral (2024-04-17)."Cheaper, Better, Faster, Stronger".mistral.ai.Archived from the original on 2024-05-05. Retrieved2024-05-05.
  96. ^"Phi-3".azure.microsoft.com. 23 April 2024.Archived from the original on 2024-04-27. Retrieved2024-04-28.
  97. ^"Phi-3 Model Documentation".huggingface.co.Archived from the original on 2024-05-13. Retrieved2024-04-28.
  98. ^"Qwen2".GitHub.Archived from the original on 2024-06-17. Retrieved2024-06-17.
  99. ^DeepSeek-AI; Liu, Aixin; Feng, Bei; Wang, Bin; Wang, Bingxuan; Liu, Bo; Zhao, Chenggang; Dengr, Chengqi; Ruan, Chong (2024-06-19),DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model,arXiv:2405.04434
  100. ^"NVIDIA Open Models License".Nvidia. 16 June 2025. Retrieved6 August 2025.
  101. ^"Trustworthy AI".Nvidia. 27 June 2024. Retrieved6 August 2025.
  102. ^"nvidia/Nemotron-4-340B-Base · Hugging Face".huggingface.co. 2024-06-14.Archived from the original on 2024-06-15. Retrieved2024-06-15.
  103. ^"Nemotron-4 340B | Research".research.nvidia.com.Archived from the original on 2024-06-15. Retrieved2024-06-15.
  104. ^"Introducing Claude 3.5 Sonnet".www.anthropic.com. Retrieved8 August 2025.
  105. ^"Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku".www.anthropic.com. Retrieved8 August 2025.
  106. ^"The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
  107. ^"llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models".GitHub.Archived from the original on 2024-07-23. Retrieved2024-07-23.
  108. ^"LICENSE · xai-org/grok-2 at main". 5 November 2025. Retrieved18 November 2025 – viaHugging Face.
  109. ^"xAI Acceptable Use Policy".xAI. 2 January 2025. Retrieved18 November 2025.
  110. ^Weatherbed, Jess (14 August 2024)."xAI's new Grok-2 chatbots bring AI image generation to X".The Verge. Retrieved18 November 2025.
  111. ^Ha, Anthony (24 August 2025)."Elon Musk says xAI has open sourced Grok 2.5".TechCrunch. Retrieved18 November 2025.
  112. ^"Introducing OpenAI o1".openai.com. Retrieved8 August 2025.
  113. ^ab"Models Overview".mistral.ai. Retrieved2025-03-03.
  114. ^"Phi-4 Model Card".huggingface.co. Retrieved2025-11-11.{{cite web}}: CS1 maint: url-status (link)
  115. ^"Introducing Phi-4: Microsoft's Newest Small Language Model Specializing in Complex Reasoning".techcommunity.microsoft.com. Retrieved2025-11-11.{{cite web}}: CS1 maint: url-status (link)
  116. ^deepseek-ai/DeepSeek-V3, DeepSeek, 2024-12-26, retrieved2024-12-26
  117. ^Feng, Coco (25 March 2025)."DeepSeek wows coders with more powerful open-source V3 model".South China Morning Post. Retrieved6 April 2025.
  118. ^Amazon Nova Micro, Lite, and Pro - AWS AI Service Cards3, Amazon, 2024-12-27, retrieved2024-12-27
  119. ^deepseek-ai/DeepSeek-R1, DeepSeek, 2025-01-21, retrieved2025-01-21
  120. ^DeepSeek-AI; Guo, Daya; Yang, Dejian; Zhang, Haowei; Song, Junxiao; Zhang, Ruoyu; Xu, Runxin; Zhu, Qihao; Ma, Shirong (2025-01-22),DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning,arXiv:2501.12948
  121. ^Qwen; Yang, An; Yang, Baosong; Zhang, Beichen; Hui, Binyuan; Zheng, Bo; Yu, Bowen; Li, Chengyuan; Liu, Dayiheng (2025-01-03),Qwen2.5 Technical Report,arXiv:2412.15115
  122. ^abMiniMax; Li, Aonian; Gong, Bangwei; Yang, Bo; Shan, Boji; Liu, Chang; Zhu, Cheng; Zhang, Chunhao; Guo, Congchao (2025-01-14),MiniMax-01: Scaling Foundation Models with Lightning Attention,arXiv:2501.08313
  123. ^MiniMax-AI/MiniMax-01, MiniMax, 2025-01-26, retrieved2025-01-26
  124. ^Kavukcuoglu, Koray (5 February 2025)."Gemini 2.0 is now available to everyone".Google. Retrieved6 February 2025.
  125. ^"Gemini 2.0: Flash, Flash-Lite and Pro".Google for Developers. Retrieved6 February 2025.
  126. ^Franzen, Carl (5 February 2025)."Google launches Gemini 2.0 Pro, Flash-Lite and connects reasoning model Flash Thinking to YouTube, Maps and Search".VentureBeat. Retrieved6 February 2025.
  127. ^"Claude 3.7 Sonnet and Claude Code".www.anthropic.com. Retrieved8 August 2025.
  128. ^"Introducing GPT-4.5".openai.com. Retrieved8 August 2025.
  129. ^"Grok 3 Beta — The Age of Reasoning Agents".x.ai. Retrieved2025-02-22.
  130. ^Kavukcuoglu, Koray (25 March 2025)."Gemini 2.5: Our most intelligent AI model".Google. Retrieved23 September 2025.
  131. ^"meta-llama/Llama-4-Maverick-17B-128E · Hugging Face".huggingface.co. 2025-04-05. Retrieved2025-04-06.
  132. ^"The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation".ai.meta.com. Archived fromthe original on 2025-04-05. Retrieved2025-04-05.
  133. ^"Introducing OpenAI o3 and o4-mini".openai.com. Retrieved8 August 2025.
  134. ^Team, Qwen (2025-04-29)."Qwen3: Think Deeper, Act Faster".Qwen. Retrieved2025-04-29.
  135. ^"Introducing Claude 4".www.anthropic.com. Retrieved8 August 2025.
  136. ^"zai-org/GLM-4.5 · Hugging Face".huggingface.co. 2025-08-04. Retrieved2025-08-06.
  137. ^"GLM-4.5: Reasoning, Coding, and Agentic Abililties".z.ai. Retrieved2025-08-06.
  138. ^Whitwam, Ryan (5 August 2025)."OpenAI announces two "gpt-oss" open AI models, and you can download them today".Ars Technica. Retrieved6 August 2025.
  139. ^"Claude Opus 4.1".www.anthropic.com. Retrieved8 August 2025.
  140. ^"Introducing GPT-5".openai.com. 7 August 2025. Retrieved8 August 2025.
  141. ^"OpenAI Platform: GPT-5 Model Documentation".openai.com. Retrieved18 August 2025.
  142. ^"deepseek-ai/DeepSeek-V3.1 · Hugging Face".huggingface.co. 2025-08-21. Retrieved2025-08-25.
  143. ^"DeepSeek-V3.1 Release | DeepSeek API Docs".api-docs.deepseek.com. Retrieved2025-08-25.
  144. ^"Apertus: Ein vollständig offenes, transparentes und mehrsprachiges Sprachmodell" (in German). Zürich: ETH Zürich. 2025-09-02. Retrieved2025-11-07.
  145. ^Kirchner, Malte (2025-09-02)."Apertus: Schweiz stellt erstes offenes und mehrsprachiges KI-Modell vor".heise online (in German). Retrieved2025-11-07.
  146. ^"Introducing Claude Sonnet 4.5".www.anthropic.com. Retrieved29 September 2025.
  147. ^"Introducing DeepSeek-V3.2-Exp | DeepSeek API Docs".api-docs.deepseek.com. Retrieved2025-10-01.
  148. ^"deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face".huggingface.co. 2025-09-29. Retrieved2025-10-01.
  149. ^"DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek-V3.2-Exp"(PDF).GitHub. Retrieved2025-10-01.
  150. ^"GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities".z.ai. Retrieved2025-10-01.
  151. ^"zai-org/GLM-4.6 · Hugging Face".huggingface.co. 2025-09-30. Retrieved2025-10-01.
  152. ^"GLM-4.6".modelscope.cn. Retrieved2025-10-01.
  153. ^"A new era of intelligence with Gemini 3".Google. 18 November 2025. Retrieved5 January 2026.
  154. ^"Introducing Claude Opus 4.5".www.anthropic.com. Retrieved8 January 2026.
  155. ^"Advancing science and math with GPT-5.2".openai.com. Retrieved4 January 2026.
  156. ^"Pushing Qwen3-Max-Thinking Beyond its Limits".Qwen. 25 January 2026.Archived from the original on 6 February 2026. Retrieved6 February 2026.We further enhance Qwen3-Max-Thinking with two key innovations: (1) adaptive tool-use capabilities [...]; and (2) advanced test-time scaling techniques [...]. [...] We limit [parallel trajectories] and redirect saved computation to iterative self-reflection guided by a "take-experience" mechanism.
General terms
Text analysis
Text segmentation
Automatic summarization
Machine translation
Distributional semantics models
Language resources,
datasets and corpora
Types and
standards
Data
Automatic identification
and data capture
Topic model
Computer-assisted
reviewing
Natural language
user interface
Related
Portal:
Retrieved from "https://en.wikipedia.org/w/index.php?title=List_of_large_language_models&oldid=1337854844"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp