Movatterモバイル変換

GPT-J

From Wikipedia, the free encyclopedia

Open source artificial intelligence text generating language model developed by EleutherAI

GPT-J
Logo
Developer	EleutherAI
Initial release	June 9, 2021; 4 years ago (2021-06-09)
Type	Large language model Generative pre-trained transformer Foundation model
License	Apache License 2.0
Website	6b.eleuther.ai

GPT-J orGPT-J-6B is an open-sourcelarge language model (LLM) developed byEleutherAI in 2021.^[1] As the name suggests, it is agenerative pre-trained transformer model designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.^[2] The model is available onGitHub, but the web interface no longer communicates with the model. Development stopped in 2021.^[3]

Architecture

[edit]

GPT-J is aGPT-3-like model with 6 billion parameters.^[4] Like GPT-3, it is anautoregressive, decoder-onlytransformer model designed to solvenatural language processing (NLP) tasks by predicting how a piece of text will continue.^[1]

Its architecture differs from GPT-3 in three main ways.^[1]

Theattention andfeedforward neural network were computedin parallel during training, allowing for greater efficiency.
The GPT-J model usesrotary position embeddings, which has been found to be a superior method of injecting positional information into transformers.^[5]^[6]
GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257tokens, the same size asGPT-2's.^[2] It has acontext window size of 2048 tokens.^[7]

It was trained onthe Pile dataset,^[2]^[4] using the Mesh Transformer JAX library inJAX to handle the parallelization scheme.^[2]^[8]

Performance

[edit]

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without firstfine-tuning the model for a specific task.^[2] Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).^[9]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.^[4] It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.^[10] With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.^[1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.^[2]

Applications

[edit]

The untuned GPT-J is available on EleutherAI's website,^[11]NVIDIA's Triton Inference Server,^[12] and NLP Cloud's website.^[13]Cerebras^[1] andAmazon Web Services^[14]^[15] offer services to fine-tune the GPT-J model for company-specific tasks.Graphcore offers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.^[16] CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.^[17]^[18]

In March 2023,Databricks released Dolly, anApache-licensed, instruction-following model created by fine-tuning GPT-J on theStanford Alpaca dataset.^[19]NovelAI's Sigurd^[20] and Genji-JP 6B^[21] models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.^[22]

EleutherAI has received praise from Cerebras,^[1] GPT-3 Demo,^[4] NLP Cloud,^[13] and Databricks^[19] for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.^[10]^[16]^[23]

References

[edit]

^^a ^b ^c ^d ^e ^fVassilieva, Natalia (22 June 2022)."Cerebras Makes It Easy to Harness the Predictive Power of GPT-J".Cerebras. Retrieved14 June 2023.
^^a ^b ^c ^d ^e ^f"GPT-J 6B".Hugging Face. 3 May 2023. Retrieved13 June 2023.
^Wang, Ben (2025-01-25),kingoflolz/mesh-transformer-jax, retrieved2025-01-27
^^a ^b ^c ^d"GPT-J". GPT-3 Demo. Retrieved13 June 2023.
^Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021)."Rotary Embeddings: A Relative Revolution".EleutherAI. Retrieved14 June 2023.In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
^Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding".arXiv:2104.09864 [cs.CL].
^"GPT-J".GitHub.Hugging Face. Retrieved23 June 2023.
^Wang, Ben; Komatsuzaki, Aran (May 2021)."Mesh Transformer JAX".GitHub. Retrieved13 June 2023.
^Forefront (14 October 2021)."GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".Medium. Forefront. Retrieved13 June 2023.
^^a ^b"GPT-J Reviews".Slashdot. Retrieved23 June 2023.
^"Test the EAI models".EleutherAI. 2021. Retrieved30 June 2023.
^Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022)."Deploying GPT-J and T5 with NVIDIA Triton Inference Server".NVIDIA. Retrieved30 June 2023.
^^a ^bVettier, Pauline (16 September 2021)."NLP Cloud now supports GPT-J, the open-source GPT-3 alternative" (Press release). Grenoble, France: NLP Cloud. Retrieved30 June 2023.
^Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023)."Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library".Amazon Web Services. Retrieved30 June 2023.
^Schmid, Philipp (11 January 2022)."Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker".Hugging Face. Retrieved30 June 2023.
^^a ^bLiguori, Sofia (9 June 2023)."Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks".Graphcore. Retrieved23 June 2023.
^"GPT-J-6B".CoreWeave. 23 June 2023. Retrieved30 June 2023.
^Hjelm, Max."CoreWeave Powers a World of Possibility with GPT-J".CoreWeave. Retrieved30 June 2023.
^^a ^bConover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023)."Hello Dolly: Democratizing the magic of ChatGPT with open models".Databricks. Retrieved18 June 2023.
^NovelAI (9 May 2022)."The faces of NovelAI's AI Models: Part 1".Medium. Retrieved1 July 2023.
^NovelAI (3 November 2021)."Data Efficient Language Transfer with GPT-J".Medium. Retrieved1 July 2023.
^NovelAI (29 July 2021)."Introducing Custom AI Modules".Medium. Retrieved1 July 2023.
^Shiraly, Karthik (26 February 2023)."See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks". Width.ai. Retrieved23 June 2023.

Generative AI

Concepts

Chatbots

Models

Text	Claude Gemini Gemma GPT 1 2 3 J 4 4o 4.5 4.1 OSS 5 Llama o1 o3 o4-mini Qwen Velvet
Coding	Base44 Claude Code Cursor Devstral GitHub Copilot Kimi Qwen3-Coder Replit
Image	Aurora Firefly Flux GPT Image 1 Ideogram Imagen Midjourney Qwen-Image Recraft Seedream Stable Diffusion
Video	Dream Machine Hailuo AI Kling Runway Gen Seedance LTX-2 Sora Veo Wan
Speech	15.ai Eleven MiniMax Speech 2.5 WaveNet
Music	Eleven Music Endel Lyria Riffusion Suno Udio

Controversies

Agents

Companies

Category

Artificial intelligence (AI)

Concepts

Applications

Implementations

Audio–visual	AlexNet WaveNet Human image synthesis HWR OCR Computer vision Speech synthesis 15.ai ElevenLabs Speech recognition Whisper Facial recognition AlphaFold Text-to-image models Aurora DALL-E Firefly Flux Ideogram Imagen Midjourney Recraft Stable Diffusion Text-to-video models Dream Machine Runway Gen Hailuo AI Kling Sora Veo Music generation Riffusion Suno AI Udio
Text	Word2vec Seq2seq GloVe BERT T5 Llama Chinchilla AI PaLM GPT 1 2 3 J ChatGPT 4 4o o1 o3 4.5 4.1 o4-mini 5 5.1 Claude Gemini Gemini (language model) Gemma Grok LaMDA BLOOM DBRX Project Debater IBM Watson IBM Watsonx Granite PanGu-Σ DeepSeek Qwen
Decisional	AlphaGo AlphaZero OpenAI Five Self-driving car MuZero Action selection AutoGPT Robot control

People

Architectures

Category

Retrieved from "https://en.wikipedia.org/w/index.php?title=GPT-J&oldid=1306036116"

Categories:

Hidden categories:

[8]ページ先頭