Hugging Face Integration#

Using the CLI for Hugging Face Integration#

NeMo provides command-line tools for importing models from Hugging Face and exporting NeMo models to Hugging Face format.

Importing from Hugging Face#

To import a model from Hugging Face, use thenemollmimport command. When importing from Hugging Face Transformers, youmust prefix the repository path withhf://. The basic syntax is:

nemollmimportmodel=<nemo_model_name>source="hf://<huggingface_repo>"

For example, to import Llama 3.2 1B from Hugging Face:

nemollmimportmodel=llama32_1bsource="hf://meta-llama/Llama-3.2-1B"

Note that thehf:// prefix is required - without it, the import will fail. This prefix tells NeMo to use the Hugging Face Transformers importer for the conversion.A new improved version of this will be released soon.

Some more examples with the requiredhf:// prefix:

# Import Mistral 7Bnemollmimportmodel=mistral_7bsource="hf://mistralai/Mistral-7B-v0.1"# Import Mixtral 8x7Bnemollmimportmodel=mixtral_8x7bsource="hf://mistralai/Mixtral-8x7B-v0.1"

You can also specify an output path and whether to overwrite existing checkpoints:

nemollmimportmodel=llama32_1bsource="hf://meta-llama/Llama-3.2-1B"\output_path="/path/to/save"overwrite=true

To see all available models that can be imported, use:

nemollmimport--help

This will show a list of supported models under the “Factory for model” section, including:

  • Various Llama models (llama32_1b, llama32_3b, llama3_8b, llama3_70b, etc.)

  • Mistral and Mixtral models

  • Nemotron models

  • Mamba2 models

  • T5 models

  • And many more

You can preview what will happen during the import without actually performing it using the--dryrun flag:

nemollmimportmodel=llama32_1bsource="hf://meta-llama/Llama-3.2-1B"--dryrun

Exporting to Hugging Face#

To export a NeMo model to Hugging Face format, use thenemollmexport command:

nemollmexportmodel=<model_name>source=<nemo_checkpoint>output_path=<hf_output_dir>

For example:

nemollmexportmodel=llama32_1bsource=/path/to/nemo/checkpoint\output_path=/path/to/hf/output

The exported model will be in a format compatible with Hugging Face’s Transformers library.

Implementing a new model importer#

The sections below explain how to implement Hugging Face model conversion for new model architectures. This is only needed if you want to add support for converting a new type of model - most users can simply use the CLI commands above with the existing supported models.

Model Conversion Classes#

Theio.ConnectorMixin class can be used to make a NeMo model compatible with Hugging Face.io.ConnectorMixin makes it possible to load Hugging Face models into NeMo and save NeMo models in Hugging Face format. TheGPTModel class below shows how to implement this (we can ignore the other mixins here):

classGPTModel(L.LightningModule,io.IOMixin,io.ConnectorMixin,fn.FNMixin):...

The generic base class is extended by several models to provide two-way integration with Hugging Face. These models include:

  • GemmaModel

  • LLamaModel

  • MistralModel

  • MixtralModel

Fine-Tune a Model using a Hugging Face Checkpoint#

To fine-tune a model, use the following script:

importnemo_runasrunfromnemo.collectionsimportllmfromnemoimportlightningasnl@run.factorydefmistral():returnllm.MistralModel()@run.factorydeftrainer(devices=2)->nl.Trainer:strategy=nl.MegatronStrategy(tensor_model_parallel_size=devices)returnnl.Trainer(devices=devices,max_steps=100,accelerator="gpu",strategy=strategy,plugins=nl.MegatronMixedPrecision(precision="bf16-mixed"),)resume=nl.AutoResume(import_path="hf://mistralai/Mistral-7B-v0.1")sft=run.Partial(llm.finetune,model=mistral,data=llm.squad,trainer=trainer,resume=resume)

The script will try to load amodel_importer with the name “hf” on theMistralModel.It then loads the following class insidenemo/collections/llm/gpt/model/mistral.py to perform the conversion:

@io.model_importer(MistralModel,"hf")classHFMistralImporter(io.ModelConnector["MistralForCausalLM",MistralModel]):...

Note that this conversion only occurs once. Afterwards, the converted checkpoint will be loaded from the$NEMO_HOME dir.

Create a Model Importer#

To implement a custom model importer, you can follow the structure of theHFMistralImporterclass. Here’s a step-by-step explanation of how to create a custom model importer.

  1. Define a new class that inherits fromio.ModelConnector:

    @io.model_importer(YourModel,"source_format")classCustomImporter(io.ModelConnector["SourceModel",YourModel]):# Implementation here
  2. ReplaceYourModel with your target model class,"source_format" with the format you’re importing from, and"SourceModel" with the source model type. You can choose"source_format" to be any string. In the Mistral example, we use the“hf” string to demonstrate that we are importing from Hugging Face.

  3. Implement the required methods:

    classCustomImporter(io.ModelConnector["SourceModel",YourModel]):definit(self)->YourModel:# Initialize and return your target modelreturnYourModel(self.config,tokenizer=self.tokenizer)defapply(self,output_path:Path)->Path:# Load source model, convert state, and save target modelsource=SourceModel.from_pretrained(str(self))target=self.init()trainer=self.nemo_setup(target)self.convert_state(source,target)self.nemo_save(output_path,trainer)# Clean up and return output pathteardown(trainer,target)returnoutput_pathdefconvert_state(self,source,target):# Define mapping between source and target model statesmapping={"source_key1":"target_key1","source_key2":"target_key2",# ... more mappings ...}returnio.apply_transforms(source,target,mapping=mapping,transforms=[])@propertydeftokenizer(self)->"YourTokenizer":# Return the appropriate tokenizer for your modelreturnYourTokenizer(str(self))@propertydefconfig(self)->YourModelConfig:# Load source config and convert to target configsource_config=SourceConfig.from_pretrained(str(self))returnYourModelConfig(# Set appropriate parameters based on source_config)
  4. Implement custom state transforms:

    The@io.state_transform decorator is a powerful tool for defining custom transformations between source and target model states. It allows you to specify complex mappings that go beyond simple key renaming.

    @io.state_transform(source_key=("source.key1","source.key2"),target_key="target.key")def_custom_transform(ctx:io.TransformCTX,source1,source2):# Implement custom transformation logicreturntransformed_data

    The following list describes the key aspects of thestate_transform decorator:

    1. Source and Target Keys:

      • source_key: Specifies the key(s) in the source model state. Can be a single string or a tuple of strings.

      • target_key: Specifies the key in the target model state where the transformed data will be stored.

      • Wildcard*: Used to apply the transform across multiple layers or components.

    2. Transform Function:

      • The decorated function receives the source tensor(s) as arguments.

      • It should return the transformed tensor(s) for the target model.

    3. Context Object:

      • The first argument,ctx, is aTransformCTX object. It provides access to both source and target models and their configs.

    4. Multiple Source Keys:

      • When multiple source keys are specified, the transform function receives multiple tensors as arguments.

    5. Flexible Transformations:

      • You can perform arbitrary operations on the tensors, including reshaping, concatenating, splitting, or applying mathematical operations.

    The following example shows a more complex transform using wildcards:

    @io.state_transform(source_key=("model.layers.*.self_attn.q_proj.weight","model.layers.*.self_attn.k_proj.weight","model.layers.*.self_attn.v_proj.weight"),target_key="decoder.layers.*.self_attention.qkv.weight")def_combine_qkv_weights(ctx:io.TransformCTX,q,k,v):# Combine separate Q, K, V weights into a single QKV tensorreturntorch.cat([q,k,v],dim=0)

    This transform combines separate Q, K, and V weight matrices from the source model into a single QKV weight matrix for the target model. The use of* in the keys is crucial:

    • Insource_key,model.layers.* matches all layers in the source model.

    • Intarget_key,decoder.layers.* corresponds to all layers in the target model.

    The wildcard ensures that this transform is applied to each layer of the model automatically. Without it, you’d need to write separate transforms for each layer manually. This makes the code more concise and easier to maintain, especially for models with many layers.

    The transform function itself (_combine_qkv_weights) will be called once for each layer, withq,k, andv containing the weights for that specific layer.

  5. Add these transforms to theconvert_state method:

    defconvert_state(self,source,target):mapping={# ... existing mappings ...}returnio.apply_transforms(source,target,mapping=mapping,transforms=[_custom_transform])

By following this structure, you can create a custom model importer that converts models from a source format to your target format, handling state mapping and any necessary transformations.