InfoQ HomepageNewsEleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J
EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J
This item injapanese
Jul 13, 20212min read
Write for InfoQ
Feed your curiosity.Help 550k+ globalsenior developers
each month stay ahead.Get in touch
A team of researchers fromEleutherAI have open-sourcedGPT-J, a six-billion parameter natural language processing (NLP) AI model based onGPT-3. The model was trained on an 800GBopen-source text dataset and has performance comparable to a GPT-3 model of similar size.
Developer Aran Komatsuzakiannounced the release on his blog. The model was trained on EleutherAI's Pile dataset using Google Cloud'sv3-256 TPUs; training took approximately five weeks. On common NLP benchmark tasks, GPT-Jachieves an accuracy similar to OpenAI's published results for their 6.7B parameter version of GPT-3. EleutherAI's release includes the model code, pre-trained weight files,Colab notebook, and ademo website. According to Komatsuzaki,
GPT-J is the best-performing publicly available Transformer [language model] in terms of zero-shot performance on various [down-stream] tasks.
OpenAI first published a paper ongenerative pre-trained transformers (GPT), an unsupervised learning model that achieved state-of-the-art results on several NLP tasks, in 2018. In early 2019, OpenAI announced a 1.5B parameter model calledGPT-2. OpenAI initially declined to release the largest trained model, citing "concerns about malicious applications of the technology," but didrelease the model later that year. Last year, OpenAI announced a 175B parameter model,GPT-3, but again did not release the trained model files. Instead, OpenAI provided an API that allows developers tointegrate the model into their code via web service calls.
EleutherAI, a "decentralized grassroots collective of volunteer researchers," released their first implementation of a GPT-like system, the 2.7B parameterGPT-Neo model, in March 2021. GPT-Neo was implemented inTensorFlow and trained on TPUs using the parallel libraryMesh TensorFlow. The team also began developingGPT-NeoX, a GPU-based implementation that uses Microsoft'sDeepSpeed; although the code is open-sourced, there are currently no model files available.
The latest model, GPT-J, was trained using a new library, Mesh-Transformer-JAX. The library uses Google'sJAX linear algebra framework, instead of a dedicated deep-learning framework such as TensorFlow. Komatsuzaki claims that GPT-J provides "more flexible and faster inference than Tensorflow," and developing the model took much less time than previous projects. Compared to the 2.7GB GPT-Neo model, GPT-J shows a 125% improvement in training efficiency.
In response to concerns about the misuse of its models, EleutherAI co-founder Connor Leahy posted ajustification of the release on the organization's blog. Leahy noted that GPT-like models are "simple and theoretically straight-forward," making it infeasible to keep the technology out of the hands of bad actors. Instead, EleutherAI's goal is to enable more widespread safety research, especially for "low-resource" researchers. Leahy also pointed out that many well-funded organizations have already trained even larger models than GPT-3, includingMicrosoft,NVIDIA, andGoogle.
In a Twitter discussion about the release, a user asked about the hardware requirements for running the model.Komatsuzaki replied
For inference, in principle you can modify the code to run it on any hardware that can hold a bit more than 12GB of memory. Best throughput can be achieved with TPUs, in which case you can just run as is. Fine-tuning is more demanding: you need at least TPU v3-8 to do that.
The GPT-Jcode and models are available on GitHub. EleutherAI's website hosts aninteractive demo of the model's text generation capabilities.
This content is in theAI, ML & Data Engineering topic
Related Topics:
Related Editorial
An Engineering Blueprint for Real‑Time Recommendation Systems
Related Sponsors
Related Sponsor
%2ffilters%3ano_upscale()%2fsponsorship%2ftopic%2fb4c9039c-eac9-4066-8fab-83c48efcd6cf%2fsnowplow-logo+purple-1761807934129.png&f=jpg&w=240)
Snowplow enables digital-first companies to turn behavioral data into fuel for real-time advanced analytics, predictive modeling, hyper-personalization, and customer-facing AI agent context.Learn More.
The InfoQ Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers.View an example