Movatterモバイル変換


[0]ホーム

URL:


BT

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

We protect your privacy.

QCon London (March 16-19, 2026): Gain actionable insights & strategic perspectives to tackle complex enterprise challenges. Register

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In
or

Don't have an InfoQ account?

Register
  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Topics

Choose your language

InfoQ HomepageNewsEleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

This item injapanese

Jul 13, 20212min read

Write for InfoQ

Feed your curiosity.Help 550k+ global
senior developers
each month stay ahead.
Get in touch

A team of researchers fromEleutherAI have open-sourcedGPT-J, a six-billion parameter natural language processing (NLP) AI model based onGPT-3. The model was trained on an 800GBopen-source text dataset and has performance comparable to a GPT-3 model of similar size.

Developer Aran Komatsuzakiannounced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud'sv3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-Jachieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files,Colab notebook, and ademo website. According to Komatsuzaki,

GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.

OpenAI first published a paper ongenerative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model calledGPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but didrelease the model later that year. Last year, OpenAI announced a 175B parameter model,GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers tointegrate the model into their code via web service calls.

EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameterGPT⁠-⁠Neo model, in March 2021. GPT-Neo was implemented inTensorFlow and trained on TPUs using the parallel libraryMesh TensorFlow. The team also began developingGPT-NeoX, a GPU-based implementation that uses Microsoft'sDeepSpeed; although the code is open-sourced, there are currently no model files available.

The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google'sJAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.

In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted ajustification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, includingMicrosoft,NVIDIA, andGoogle.

In a Twitter discussion about the release, a user asked about the hardware requirements for running the model.Komatsuzaki replied

For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.

The GPT-Jcode and models are available on GitHub. EleutherAI's website hosts aninteractive demo of the model's text generation capabilities.

Rate this Article

Adoption
Style

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers.View an example

We protect your privacy.

BT

[8]ページ先頭

©2009-2025 Movatter.jp