Introduction#
NeMo provides a powerful command line interface (CLI) that makes it easy to train, fine-tune, evaluate, and deploy models.The CLI follows a consistent pattern that makes it straightforward to use once you understand the basic structure:
nemo[collection][command][options]
Where:
collectionis the model collection (e.g.,llm)commandis the action to perform (e.g.,pretrain,finetune,generate)optionsare additional parameters to customize the command
Note
Currently, the NeMo 2.0 CLI is only available for the LLM collection, more collections will be supported in future releases.
Basic Usage#
To see available commands within a collection, use the help flag:
$nemollm--helpUsage:nemollm[OPTIONS]COMMAND[ARGS]...[Module]llm╭─Options────────────────────────────────────────────────────────────────╮│--helpShowthismessageandexit.│╰──────────────────────────────────────────────────────────────────────────╯╭─Commands───────────────────────────────────────────────────────────────╮│train[Entrypoint]train││pretrain[Entrypoint]pretrain││finetune[Entrypoint]finetune││validate[Entrypoint]validate││prune[Entrypoint]prune││distill[Entrypoint]distill││ptq[Entrypoint]ptq││deploy[Entrypoint]deploy││import[Entrypoint]import││export[Entrypoint]export││generate[Entrypoint]generate│╰──────────────────────────────────────────────────────────────────────────╯
Each command represents a different task you can perform with NeMo models.
Pre-training Models#
Thepretrain command allows you to train language models from scratch using pre-configured recipes.
Listing Available Recipes#
To see all available pre-training recipes:
$nemollmpretrain--helpUsage:nemollmpretrain[OPTIONS][ARGUMENTS][Entrypoint]pretrainPretrainsamodelusingthespecifieddataandtrainer,withoptionallogging,resuming,andoptimization.Thisfunctionisawrapperaroundthe`train`function,specificallyconfiguredforpretrainingtasks.Note,bydefaultitwillusethetokenizerfromthemodel.╭─Pre-loadedentrypointfactories,runwith--factory──────────────────────────────────────╮│baichuan2_7bnemo.collections.llm.recipes.baichuan2_7b.pr…line142││baichuan2_7b_optimizednemo.collections.llm.recipes.baichuan2_7b.pr…line190││bert_110mnemo.collections.llm.recipes.bert_110m.pretr…line50││bert_340mnemo.collections.llm.recipes.bert_340m.pretr…line50││chatglm3_6bnemo.collections.llm.recipes.chatglm3_6b.pre…line142││chatglm3_6b_optimizednemo.collections.llm.recipes.chatglm3_6b.pre…line190││deepseek_v2nemo.collections.llm.recipes.deepseek_v2.pre…line54││deepseek_v2_litenemo.collections.llm.recipes.deepseek_v2_lit…line54││gemma2_2bnemo.collections.llm.recipes.gemma2_2b.pretr…line53││gemma2_9bnemo.collections.llm.recipes.gemma2_9b.pretr…line53││llama3_8bnemo.collections.llm.recipes.llama3_8b.pretr…line145││llama3_70bnemo.collections.llm.recipes.llama3_70b.pret…line145││mixtral_8x7bnemo.collections.llm.recipes.mixtral_8x7b.pr…line143││nemotron3_8bnemo.collections.llm.recipes.nemotron3_8b.pr…line56││nemotron4_15bnemo.collections.llm.recipes.nemotron4_15b.p…line55││...(outputtruncated)│╰────────────────────────────────────────────────────────────────────────────────────────────╯
Running Pre-training with Default Recipes#
To start pre-training with a default recipe:
$nemollmpretrain--factoryllama3_8b
This command will configure and start pre-training a Llama 3 8B model using the default settings. The output will show a preview of the resolved configuration values before starting.
When run with the--dryrun flag, you can preview the configuration without starting the training:
$nemollmpretrain--factoryllama3_8b--dryrunConfiguringglobaloptionsDryrunfortasknemo.collections.llm.api:pretrainResolvedArguments┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓┃ArgumentName┃ResolvedValue┃┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩│data│MockDataModule(seq_length=8192,micro_batch_size=1,│││global_batch_size=512)││model│LlamaModel(config=Llama3Config8B())││trainer│Trainer(│││accelerator='gpu',│││strategy=MegatronStrategy(│││tensor_model_parallel_size=1,│││pipeline_model_parallel_size=1,│││context_parallel_size=2,│││sequence_parallel=False,│││),│││devices=8,│││num_nodes=1,│││max_steps=1168251,│││)││...(outputtruncatedforbrevity)│└──────────────────────┴──────────────────────────────────────────────────────────────┘
Customizing Factory Parameters#
You can pass parameters directly to the factory function:
$nemollmpretrain--factory"llama3_70b(num_nodes=128)"This example configures the Llama 3 70B model to use 128 nodes for distributed training.
Overriding Configuration Parameters#
The CLI supports overriding any configuration parameter using Hydra-style dot notation:
$nemollmpretrain--factoryllama3_70btrainer.max_steps=2000
This syntax follows the patterncomponent.parameter=value, allowing you to navigate nested configurations. You can override multiple parameters at once by adding more space-separated overrides:
$nemollmpretrain--factoryllama3_70btrainer.max_steps=2000optim.config.lr=5e-5data.global_batch_size=256
Interactive Configuration with REPL Mode#
For interactive recipe modification, you can use the--repl flag:
$nemollmpretrain--factoryllama3_70b--repl
This command opens an interactive Python REPL (Read-Eval-Print Loop) where you can:
Inspect the entire configuration
Modify parameters interactively
Test different settings before launching the training job
Execute custom Python code to set up your configuration
Inside the REPL, you’ll have access to all the components of the configuration (model, data, trainer, etc.) and can modify them directly:
# Example of what you might do in the REPL# View the model configprint(model.config)# Modify learning rateoptim.config.lr=2e-5# Change the number of training stepstrainer.max_steps=5000# Start the training when readyrun()
Fine-tuning Models#
Similar to pre-training, NeMo provides recipes for fine-tuning models:
$nemollmfinetune--helpUsage:nemollmfinetune[OPTIONS][ARGUMENTS][Entrypoint]finetuneFinetunesamodelusingthespecifieddataandtrainer,withoptionallogging,resuming,andPEFT.Note,bydefaultitwillusethetokenizerfromthemodel.╭─Pre-loadedentrypointfactories,runwith--factory──────────────────────────────────────╮│baichuan2_7bnemo.collections.llm.recipes.baichuan2_7b.fi…line236││chatglm3_6bnemo.collections.llm.recipes.chatglm3_6b.fin…line236││deepseek_v2nemo.collections.llm.recipes.deepseek_v2.fin…line108││deepseek_v2_litenemo.collections.llm.recipes.deepseek_v2_lit…line107││gemma2_2bnemo.collections.llm.recipes.gemma2_2b.finet…line173││gemma2_9bnemo.collections.llm.recipes.gemma2_9b.finet…line173││llama2_7bnemo.collections.llm.recipes.llama2_7b.finet…line230││llama3_8bnemo.collections.llm.recipes.llama3_8b.finet…line245││llama3_70bnemo.collections.llm.recipes.llama3_70b.fine…line251││mixtral_8x7bnemo.collections.llm.recipes.mixtral_8x7b.fi…line240││nemotron3_8bnemo.collections.llm.recipes.nemotron3_8b.fi…line253││nemotron4_15bnemo.collections.llm.recipes.nemotron4_15b.f…line227││...(outputtruncated)│╰────────────────────────────────────────────────────────────────────────────────────────────╯
The available models for fine-tuning include a wide range of architectures:
Llama 2 and Llama 3 family
Nemotron 3 and Nemotron 4 family
Mixtral and other mixture-of-experts models
Mamba2 models including SSM and hybrid architectures
Encoder-decoder models like T5
And many more
Fine-tuning recipes include support for Parameter-Efficient Fine-Tuning (PEFT) methods. Notice that thefinetune command has an additionalpeft argument compared to thepretrain command.
To fine-tune a model:
$nemollmfinetune--factoryllama3_8b
Creating and Running Custom Recipes#
You can create custom recipes in Python scripts that use the same CLI interface. Here’s how a custom recipe might look:
# custom_recipe.pyimportnemo_runasrunfromnemo.collectionsimportllmfromnemo.collections.llm.recipesimportllama3_8b,llama3_70bdefcustom_llama3_8b():pretrain=llama3_8b.pretrain_recipe(num_nodes=1,num_gpus_per_node=8)pretrain.trainer.val_check_interval=400pretrain.log.ckpt.save_top_k=-1pretrain.log.ckpt.every_n_train_steps=400pretrain.trainer.max_steps=1000returnpretraindefcustom_llama3_70b():pretrain=llama3_70b.pretrain_recipe(num_nodes=1,num_gpus_per_node=8)pretrain.trainer.val_check_interval=400pretrain.log.ckpt.save_top_k=-1pretrain.log.ckpt.every_n_train_steps=400pretrain.trainer.max_steps=1000returnpretrainif__name__=="__main__":# When running this file, it will run the `custom_llama3_8b` recipe# To select the `custom_llama3_70b` recipe, use the following command:# python custom_recipe.py --factory custom_llama3_70b# This will automatically call the custom_llama3_70b that's defined above# Note that any parameter can be overwritten by using the following syntax:# python custom_recipe.py trainer.max_steps=2000# You can even apply transformations when triggering the CLI as if it's Python code# python custom_recipe.py "trainer.max_steps*=2"run.cli.main(llm.pretrain,default_factory=custom_llama3_8b)
When running the custom_recipe.py file, it will execute thecustom_llama3_8b recipe by default. However, you can select different recipes or modify parameters:
To select the
custom_llama3_70brecipe:pythoncustom_recipe.py--factorycustom_llama3_70b
To overwrite any parameter:
pythoncustom_recipe.pytrainer.max_steps=2000
You can even apply transformations:
pythoncustom_recipe.py"trainer.max_steps*=2"# Doubles the max_steps value
Text Generation#
NeMo provides a generate command for inference with trained models:
$nemollmgenerate
This command is used for text generation with trained NeMo LLM models. It takes a checkpoint path and a list of prompts, generates text based on the loaded model and parameters, and returns the generated text.
The command supports parameters like:
path: Path to the model checkpointtrainer: NeMo trainer configurationprompts: List of input prompts for generationinference_params: Generation parameters like temperature, top_k, and number of tokens to generatetext_only: Whether to return only text or also metadata
Advanced Features#
Model Import and Export#
NeMo CLI provides commands for importing external models (like Hugging Face models) and exporting NeMo models:
$nemollmimport--help# Import models from other frameworks$nemollmexport--help# Export NeMo models
Quantization and Pruning#
For model optimization, NeMo offers post-training quantization (PTQ) and pruning:
$nemollmptq--help# Post-training quantization$nemollmprune--help# Model pruning
Model Distillation#
For creating smaller, more efficient models:
$nemollmdistill--help# Knowledge distillationIntegration with NeMo-Run#
NeMo seamlessly supports scaling to thousands of GPUs usingNeMo-Run. For examples of launching large-scale experiments using NeMo-Run, refer toQuickstart with NeMo-Run.
The CLI allows you to specify custom execution environments by passing inrun.executor=... which can be a factory of any of the supported executors from NeMo-Run. This powerful feature enables you to run your jobs in various environments like Docker containers or Slurm clusters without modifying your recipe code.
You can see what executors are available in your environment by using the--help flag with any command. The help output will show a section called “Registered executors” at the bottom:
$nemollmfinetune--help# ... other help output ...╭─Registeredexecutors───────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮│torchrunnemo.collections.llm.recipes.run.executor.to…line20│╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
This shows that in this case, there’s atorchrun executor registered by default. You can reference this in your command line withrun.executor=torchrun.
Here are some example executor factories you can define in your custom recipe:
@run.cli.factory@run.autoconvertdefdocker()->run.Executor:returnrun.DockerExecutor(container_image="nvcr.io/nvidia/nemo:dev",volumes=[f"{BASE_DIR}/opt/NeMo-Run:/opt/NeMo-Run",f"{BASE_DIR}/opt/NeMo:/opt/NeMo",f"{BASE_DIR}/opt/megatron-lm:/opt/Megatron-LM",],env_vars={"HF_HOME":"/workspaces/models/hf","NEMO_HOME":"/workspaces/models/nemo",})@run.cli.factory@run.autoconvertdefslurm_cluster()->run.Executor:returnrun.SlurmExecutor(account=ACCOUNT,partition=SLURM_PARTITION,job_name_prefix=f"{ACCOUNT}-nemo-ux:",job_dir=BASE_DIR,container_image="nvcr.io/nvidia/nemo:dev",container_mounts=[f"/home/{USER}:/home/{USER}","/lustre:/lustre",],time="4:00:00",gpus_per_node=8,tunnel=run.SSHTunnel(host=SLURM_LOGIN_NODE,user=USER,job_dir=BASE_DIR))
With these executor factories defined, you can easily select which execution environment to use via the command line:
# Run in a Docker container$pythoncustom_recipe.pyrun.executor=docker# Run on a Slurm cluster$pythoncustom_recipe.pyrun.executor=slurm_cluster
This approach provides tremendous flexibility, allowing you to develop recipes locally and then seamlessly deploy them to different computing environments without changing your code.
Learning More About the CLI#
If you’re interested in understanding the internals of the NeMo CLI and NeMo-Run CLI system, or want to create your own CLI entrypoints and experiments, you can find detailed examples and tutorials in theNeMo-Run entrypoint examples.
This repository includes:
Detailed explanations of CLI concepts like entrypoints, factories, and partials
Examples of creating single task entrypoints
Examples of creating experiment entrypoints for sequential and parallel execution
Advanced CLI features like Pythonic argument parsing and interactive configuration
Best practices for creating effective CLI interfaces
These examples provide deeper insight into how the NeMo CLI works and how you can leverage its features for your own custom workflows.
Summary#
The NeMo CLI provides a comprehensive interface for working with large language models:
Model Architecture Support: Supports a wide range of architectures including LLaMA, Mixtral, Nemotron, Mamba, T5, and many others
Training Options:
Pre-training from scratch
Fine-tuning existing models
Parameter-efficient fine-tuning (PEFT)
Configuration Flexibility:
Override any parameter using dot notation
Interactive configuration with REPL mode
Create custom recipes in Python
Scalability:
Supports training on thousands of GPUs
Integrates with NeMo-Run for cluster management
Deployment and Optimization:
Model export and import
Quantization
Pruning
Distillation
This design allows you to quickly experiment with different models and configurations without having to write custom training scripts from scratch.