Movatterモバイル変換

[0]ホーム

Jump to content

GPT-1

Edit links

From Wikipedia, the free encyclopedia

2018 text-generating language model

This articlemay rely excessively on sourcestoo closely associated with the subject, potentially preventing the article from beingverifiable andneutral. Please helpimprove it by replacing them with more appropriatecitations toreliable, independent sources.(August 2023) (Learn how and when to remove this message)

Generative Pre-trained Transformer 1 (GPT-1)
Original author(s)	OpenAI
Initial release	June 2018; 6 years ago (June 2018)
Repository	github.com/openai/finetune-transformer-lm
Successor	GPT-2
Type	Large language model Generative pre-trained transformer
License	MIT^[1]
Website	openai.com/blog/language-unsupervised/

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural radiance field Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Generative Pre-trained Transformer 1 (GPT-1) was the first ofOpenAI'slarge language models followingGoogle's invention of thetransformer architecture in 2017.^[2] In June 2018,OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training",^[3] in which they introduced that initial model along with the general concept of agenerative pre-trained transformer.^[4]

Up to that point, the best-performing neural NLP models primarily employedsupervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use of datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;^[3]^[5] many languages (such asSwahili orHaitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.^[5] In contrast, a GPT's "semi-supervised" approach involved two stages: an unsupervisedgenerative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a superviseddiscriminative "fine-tuning" stage in which these parameters were adapted to a target task.^[3]

The use of atransformer architecture, as opposed to previous techniques involving attention-augmented RNNs, providedGPT models with a more structured memory than could be achieved through recurrent mechanisms; this resulted in "robust transfer performance across diverse tasks".^[3]

Reason for choosing BookCorpus

[edit]

BookCorpus was chosen as a training dataset partly because the long passages of continuous text helped the model learn to handle long-range information.^[6] It contained over 7,000 unpublished fiction books from various genres. The rest of the datasets available at the time, while being larger, lacked this long-range structure (being "shuffled" at a sentence level).^[3]

The BookCorpus text was cleaned by theftfy library to standardized punctuation and whitespace and thentokenized byspaCy.^[3]

Architecture

[edit]

The GPT-1 architecture was a twelve-layer decoder-onlytransformer, using twelvemasked self-attention heads, with 64-dimensional states each (for a total of 768). Rather than simplestochastic gradient descent, theAdam optimization algorithm was used; the learning rate was increased linearly from zero over the first 2,000 updates to a maximum of 2.5×10⁻⁴, andannealed to 0 using a cosine schedule.^[3] GPT-1 has 117 million parameters.^[4]

While the fine-tuning was adapted to specific tasks, its pre-training was not; to perform the various tasks, minimal changes were performed to its underlying task-agnostic model architecture.^[3] Despite this, GPT-1 still improved on previous benchmarks in several language processing tasks, outperforming discriminatively-trained models with task-oriented architectures on several diverse tasks.^[3]

Performance and evaluation

[edit]

GPT-1 achieved a 5.8% and 1.5% improvement over previous best results^[3] on natural language inference (also known astextual entailment) tasks, evaluating the ability to interpret pairs of sentences from various datasets and classify the relationship between them as "entailment", "contradiction" or "neutral".^[3] Examples of such datasets include QNLI (Wikipedia articles) and MultiNLI (transcribed speech, popular fiction, and government reports, among other sources);^[7] It similarly outperformed previous models on two tasks related to question answering andcommonsense reasoning—by 5.7% on RACE,^[8] a dataset of written question-answer pairs from middle and high school exams, and by 8.9% on the StoryCloze Test.^[9]

GPT-1 improved on previous best-performing models by 4.2% onsemantic similarity (orparaphrase detection), evaluating the ability to predict whether two sentences are paraphrases of one another, using theQuora Question Pairs (QQP) dataset.^[3]

GPT-1 achieved a score of 45.4, versus a previous best of 35.0^[3] in a text classification task using the Corpus of Linguistic Acceptability (CoLA). Finally, GPT-1 achieved an overall score of 72.8 (compared to a previous record of 68.9) on GLUE, a multi-task test.^[10]

References

[edit]

^"gpt-2".GitHub.Archived from the original on 11 March 2023. Retrieved13 March 2023.
^Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion;Gomez, Aidan N; Kaiser, Łukasz; Polosukhin, Illia (2017)."Attention is All you Need"(PDF).Advances in Neural Information Processing Systems.30. Curran Associates, Inc.
^^a ^b ^c ^d ^e ^f ^g ^h ⁱ ^j ^k ^l ^mRadford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018)."Improving Language Understanding by Generative Pre-Training"(PDF).OpenAI. p. 12.Archived(PDF) from the original on 26 January 2021. Retrieved23 January 2021.
^^a ^b"GPT-1 to GPT-4: Each of OpenAI's GPT Models Explained and Compared". 11 April 2023.Archived from the original on 2023-04-15. Retrieved2023-04-29.
^^a ^bTsvetkov, Yulia (22 June 2017)."Opportunities and Challenges in Working with Low-Resource Languages"(PDF). Carnegie Mellon University.Archived(PDF) from the original on 31 March 2020. Retrieved23 January 2021.
^Zhu, Yukun; Kiros, Ryan; Zemel, Richard; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (22 June 2015). "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books".arXiv:1506.06724 [cs.CV].# of books: 11,038 / # of sentences: 74,004,228 / # of words: 984,846,357 / mean # of words per sentence: 13 / median # of words per sentence: 11
^Williams, Adina; Nangia, Nikita; Bowman, Samuel (1 June 2018)."A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference"(PDF). Association for Computational Linguistics.Archived(PDF) from the original on 11 February 2020. Retrieved23 January 2021.At 433k examples, this resource is one of the largest corpora available for natural language inference (a.k.a. recognizing textual entailment), [...] offering data from ten distinct genres of written and spoken English [...] while supplying an explicit setting for evaluating cross-genre domain adaptation.
^Lai, Guokun; Xie, Qizhe; Hanxiao, Liu; Yang, Yiming; Hovy, Eduard (15 April 2017). "RACE: Large-scale ReAding Comprehension Dataset From Examinations".arXiv:1704.04683 [cs.CL].
^Mostafazadeh, Nasrin; Roth, Michael; Louis, Annie; Chambers, Nathanael; Allen, James F. (3 April 2017)."LSDSem 2017 Shared Task: The Story Cloze Test"(PDF). Association for Computational Linguistics.Archived(PDF) from the original on 22 November 2020. Retrieved23 January 2021.The LSDSem'17 shared task is the Story Cloze Test, a new evaluation for story understanding and script learning. This test provides a system with a four-sentence story and two possible endings, and the system must choose the correct ending. Successful narrative understanding (getting closer to human performance of 100%) requires systems to link various levels of semantics to commonsense knowledge.
^Wang, Alex; Singh, Amanpreet; Michael, Julian; Hill, Felix; Levy, Omar; Bowman, Samuel R. (20 April 2018). "GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding".arXiv:1804.07461 [cs.CL].

OpenAI

Products

Chatbots	ChatGPT in education GPT Store DALL-E SearchGPT Sora Whisper GitHub Copilot
Foundation models	OpenAI Codex Generative pre-trained transformer GPT-1 GPT-2 GPT-3 GPT-4 GPT-4o GPT-4.5 o1 o3
AI agents	Deep Research Operator

People

Senior management

Current	Sam Altman removal Greg Brockman Sarah Friar Scott Schools
Former	Mira Murati Emmett Shear

Board of directors

Current	Sam Altman Adam D'Angelo Sue Desmond-Hellmann Paul Nakasone Adebayo Ogunlesi Nicole Seligman Fidji Simo Lawrence Summers Bret Taylor (chair) Jakub Pachocki (chief scientist)
Former	Greg Brockman (2017–2023) Reid Hoffman (2019–2023) Will Hurd (2021–2023) Holden Karnofsky (2017–2021) Elon Musk (2015–2018) Ilya Sutskever (2017–2023) Helen Toner (2021–2023) Shivon Zilis (2019–2023)

Joint ventures

Stargate LLC

Category

Artificial intelligence (AI)

History (timeline)

Concepts

Applications

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Speech synthesis 15.ai ElevenLabs Speech recognition Whisper Facial recognition AlphaFold Text-to-image models Aurora DALL-E Firefly Flux Ideogram Imagen Midjourney Stable Diffusion Text-to-video models Dream Machine Gen-3 Alpha Hailuo AI Kling Sora Veo Music generation Suno AI Udio
Text	Word2vec Seq2seq GloVe BERT T5 Llama Chinchilla AI PaLM GPT 1 2 3 J ChatGPT 4 4o 4.5 o1 o3 Claude Gemini chatbot Grok LaMDA BLOOM Project Debater IBM Watson IBM Watsonx Granite PanGu-Σ DeepSeek Qwen
Decisional	AlphaGo AlphaZero OpenAI Five Self-driving car MuZero Action selection AutoGPT Robot control

People

Architectures

Portals
- Technology
Category
- Artificial neural networks
- Machine learning
List
- Companies
- Projects

Generative AI

Concepts

Models

Text	Claude DBRX DeepSeek Gemini GPT 1 2 3 J ChatGPT 4 4o 4.5 o1 o3 Grok Granite Llama Mistral Large PanGu-Σ Qwen
Image	Aurora DALL-E Firefly Flux Ideogram Midjourney Stable Diffusion
Speech	15.ai WaveNet
Video	Dream Machine Gen-3 Alpha Hailuo AI Kling Sora Veo VideoPoet
Music	Udio Suno AI

Companies

Category
Commons

Retrieved from "https://en.wikipedia.org/w/index.php?title=GPT-1&oldid=1273467500"

Categories:

Hidden categories:

[8]ページ先頭