Movatterモバイル変換


[0]ホーム

URL:


Wayback Machine
47 captures
13 Jul 2021 - 11 Dec 2025
JanFEBMar
10
202220232024
success
fail
COLLECTED BY
Collection:mega002
TIMESTAMPS
loading
The Wayback Machine - https://web.archive.org/web/20230210114137/https://www.infoq.com/news/2021/07/eleutherai-gpt-j/
BT

QCon London (March 27-29, 2023): Adopt the right emerging trends to solve your complex engineering challenges. Register Now.

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Don't have an InfoQ account?

Sign Up
Logo - Back to homepage
1,699,303 Jan unique visitors

Topics

Choose your language

InfoQ HomepageNewsEleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

This item injapanese

Bookmarks

Jul 13, 20212min read

Write for InfoQ

Join a community of experts.Increase your visibility.
Grow your career.
Learn more

A team of researchers fromEleutherAI have open-sourcedGPT-J, a six-billion parameter natural language processing (NLP) AI model based onGPT-3. The model was trained on an 800GBopen-source text dataset and has performance comparable to a GPT-3 model of similar size.

Developer Aran Komatsuzakiannounced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud'sv3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-Jachieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files,Colab notebook, and ademo website. According to Komatsuzaki,

GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.

OpenAI first published a paper ongenerative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model calledGPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but didrelease the model later that year. Last year, OpenAI announced a 175B parameter model,GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers tointegrate the model into their code via web service calls.

EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameterGPT⁠-⁠Neo model, in March 2021. GPT-Neo was implemented inTensorFlow and trained on TPUs using the parallel libraryMesh TensorFlow. The team also began developingGPT-NeoX, a GPU-based implementation that uses Microsoft'sDeepSpeed; although the code is open-sourced, there are currently no model files available.

The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google'sJAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.

In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted ajustification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, includingMicrosoft,NVIDIA, andGoogle.

In a Twitter discussion about the release, a user asked about the hardware requirements for running the model.Komatsuzaki replied

For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.

The GPT-Jcode and models are available on GitHub. EleutherAI's website hosts aninteractive demo of the model's text generation capabilities.

Rate this Article

Adoption
Style

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers.View an example

We protect your privacy.

Hello stranger!

You need toRegister an InfoQ account orLogin or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

    Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

    Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

    Discuss
    BT

    [8]ページ先頭

    ©2009-2026 Movatter.jp