Adding a Model #

This document describes how to add a typical decoder-only model in TensorRT LLM.

Step 1. Write Modeling Part#

TensorRT LLM provides different levels of APIs:

Low-level functions, for example,concat,add, andsum.
Basic layers, such as,Linear andLayerNorm.
High-level layers, such as,MLP andAttention.
Base class for typical decoder-only models, such as,DecoderModelForCausalLM.

Create a model directory intensorrt_llm/models, for examplemy_model.
Write amodel.py with TensorRT LLM’s APIs

classMyDecoderLayer(Module):def__init__(self,config:PretrainedConfig,layer_idx:int):self.layer_idx=layer_idxself.config=configself.input_layernorm=LayerNorm(...)self.attention=Attention(...)self.post_layernorm=LayerNorm(...)self.mlp=MLP(...)defforward(self,hidden_states,...):# decoder layer forwardreturnhidden_statesclassMyModel(Module):def__init__(self,config:PretrainedConfig):self.config=configself.vocab_embedding=Embedding(...)self.layers=DecoderLayerList(MyDecoderLayer,config)self.ln_f=LayerNorm(...)defforward(self,input_ids,...):# model forwardreturnhidden_statesclassMyModelForCausalLM(DecoderModelForCausalLM):def__init__(self,config:PretrainedConfig):transformer=MyModel(config)lm_head=ColumnLinear(...)super().__init__(config,transformer,lm_head)

Step 2. Implement Weight Conversion#

The weights from source framework need to be converted and bound to the new added TensorRT LLM model. Here is an example of converting HuggingFace weights:

classMyModelForCausalLM(DecoderModelForCausalLM):@classmethoddeffrom_hugging_face(cls,hf_model_dir,dtype='float16',mapping:Optional[Mapping]=None)->MyModelForCausalLM# create a TensorRT LLM MyModelForCausalLM model object# convert HuggingFace checkpoint to TensorRT LLM expected weights dict# load the weights to MyModelForCausalLM object

It’s optional to develop aconvert_checkpoint.py script in theexamples/my_model/ directory for the convenience of offline weights conversion.

Step 3. Register New Model#

Please register the new model classMyModelForCausalLM intensorrt_llm/models/__init__.py.

Step 4. Verify New Model#

At last, let’s verify the new model. The typical commands are as following:

cdexamples/my_model/pythonconvert_checkpoint.py--model_dirhf_model_dir--output_dirtllm_ckpt_dirtrtllm-build--checkpoint_dirtllm_ckpt_dir--output_dirtllm_engine_dir# try the model with a single promptpython../run.py--engine_dirtllm_engine_dir--tokenizer_dirhf_model_dir--input_text"Born in north-east France, Soyer trained as a"# run summarization taskpython../summarize.py--engine_dirtllm_engine_dir--hf_model_dirhf_model_dir--test_trt_llm

Reference#

It’s recommended to read the workflow[./workflow.md] and checkpoint[./checkpoint.md] documents for more details.

On this page

Movatterモバイル変換