Movatterモバイル変換

A team of researchers fromEleutherAI have open-sourcedGPT-J, a six-billion parameter natural language processing (NLP) AI model based onGPT-3. The model was trained on an 800GBopen-source text dataset and has performance comparable to a GPT-3 model of similar size.

Developer Aran Komatsuzakiannounced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud'sv3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-Jachieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files,Colab notebook, and ademo website. According to Komatsuzaki,

GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.

OpenAI first published a paper ongenerative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model calledGPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but didrelease the model later that year. Last year, OpenAI announced a 175B parameter model,GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers tointegrate the model into their code via web service calls.

EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameterGPT⁠-⁠Neo model, in March 2021. GPT-Neo was implemented inTensorFlow and trained on TPUs using the parallel libraryMesh TensorFlow. The team also began developingGPT-NeoX, a GPU-based implementation that uses Microsoft'sDeepSpeed; although the code is open-sourced, there are currently no model files available.

The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google'sJAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.

In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted ajustification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, includingMicrosoft,NVIDIA, andGoogle.

In a Twitter discussion about the release, a user asked about the hardware requirements for running the model.Komatsuzaki replied

For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.

The GPT-Jcode and models are available on GitHub. EleutherAI's website hosts aninteractive demo of the model's text generation capabilities.

Movatterモバイル変換

InfoQ Software Architects' Newsletter

Unlock the full InfoQ experience

Don't have an InfoQ account?

Topics

How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework

Micro-Frontends: A Sociotechnical Journey Toward a Modern Frontend Architecture

Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges

Authenticity Over Convention: Lessons from 16 Years of Solo Game Development

Trust No One: Securing the Modern Software Supply Chain with Zero Trust

Helpful links

Choose your language

EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J

Write for InfoQ

Rate this Article

This content is in theAI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsors

An Engineering Blueprint for Real‑Time Recommendation Systems

Related Sponsor

The InfoQ Newsletter

Lessons Learned in Migrating to Micro-Frontends by Luca Mezzalira at QCon SF

How to Use Apache Spark to Craft a Multi-Year Data Regression Testing and Simulations Framework

Rust at the Core: Accelerating Polyglot SDK Development by Spencer Judge at QCon SF 2025

Reddit Migrates Comment Backend from Python to Go Microservice to Halve Latency

Micro-Frontends: A Sociotechnical Journey Toward a Modern Frontend Architecture

Stripe's Zero-Downtime Data Movement Platform Migrates Petabytes with Millisecond Traffic Switches

Authenticity Over Convention: Lessons from 16 Years of Solo Game Development

Creating Impactful Software Teams That Continuously Improve

Your Platform is Not an Island: Embracing Evolution in Your Ecosystem

KubeCon NA 2025 - Robert Nishihara on Open Source AI Compute with Kubernetes, Ray, PyTorch, and vLLM

Amazon Adds A2A Protocol to Bedrock AgentCore for Interoperable Multi-Agent Workflows

Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges

Buoyant Announces MCP Support for Linkerd, Extending Service Mesh Capabilities to Agentic AI Traffic

Sauce Labs Launches AI Tool for Faster Test Analysis

Groundcover Takes Aim at Datadog with Observability Migration Tool

QCon AI New York

QCon London

QCon AI Boston