Large Language Models #

NeMo Framework has everything needed to train Large Language Models, including setting up the compute cluster, downloading data, and selecting model hyperparameters. NeMo 2.0 usesNeMo-Run to make it easy to scale LLMs to thousands of GPUs.

The following LLMs are currently supported in NeMo 2.0:

Baichuan 2
ChatGLM 3
DeepSeek V2
DeepSeek V3
Gemma
Gemma 2
GPT-OSS
Hyena
Llama 3
Llama Nemotron
Mamba 2
Mixtral
Nemotron
Phi 3
Qwen2/2.5
Qwen3
Starcoder
Starcoder 2
T5
BERT

Default configurations are provided for each model. The default configurations provided are outlined in the model-specific documentation linked above. Every configuration can be modified in order to train on new datasets or test new model hyperparameters.

Traininglong context models, or extending the context length of pre-trained models is also supported in NeMo:

Long Context Recipes / Extending Context Length

For information ondeploying LLMs:

LLM Deployment Overview

Movatterモバイル変換

Large Language Models#

Large Language Models #