Support Matrix #

TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.

Models (PyTorch Backend)#

Architecture	Model	HuggingFace Example	Modality
`BertForSequenceClassification`	BERT-based	`textattack/bert-base-uncased-yelp-polarity`	L
`DeciLMForCausalLM`	Nemotron	`nvidia/Llama-3_1-Nemotron-51B-Instruct`	L
`DeepseekV3ForCausalLM`	DeepSeek-V3	`deepseek-ai/DeepSeek-V3`	L
`Exaone4ForCausalLM`	EXAONE 4.0	`LGAI-EXAONE/EXAONE-4.0-32B`	L
`Gemma3ForCausalLM`	Gemma 3	`google/gemma-3-1b-it`	L
`Gemma3ForConditionalGeneration`	Gemma 3	`google/gemma-3-27b-it`	L + I
`HCXVisionForCausalLM`	HyperCLOVAX-SEED-Vision	`naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B`	L + I
`LlavaLlamaModel`	VILA	`Efficient-Large-Model/NVILA-8B`	L + I + V
`LlavaNextForConditionalGeneration`	LLaVA-NeXT	`llava-hf/llava-v1.6-mistral-7b-hf`	L + I
`LlamaForCausalLM`	Llama 3.1, Llama 3, Llama 2, LLaMA	`meta-llama/Meta-Llama-3.1-70B`	L
`Llama4ForConditionalGeneration`	Llama 4	`meta-llama/Llama-4-Scout-17B-16E-Instruct`	L + I
`MistralForCausalLM`	Bielik	`speakleash/Bielik-11B-v2.2-Instruct`	L
`MistralForCausalLM`	Mistral	`mistralai/Mistral-7B-v0.1`	L
`Mistral3ForConditionalGeneration`	Mistral3	`mistralai/Mistral-Small-3.1-24B-Instruct-2503`	L + I
`MixtralForCausalLM`	Mixtral	`mistralai/Mixtral-8x7B-v0.1`	L
`MllamaForConditionalGeneration`	Llama 3.2	`meta-llama/Llama-3.2-11B-Vision`	L
`NemotronForCausalLM`	Nemotron-3, Nemotron-4, Minitron	`nvidia/Minitron-8B-Base`	L
`NemotronNASForCausalLM`	NemotronNAS	`nvidia/Llama-3_3-Nemotron-Super-49B-v1`	L
`Phi3ForCausalLM`	Phi-4	`microsoft/Phi-4`	L
`Phi4MMForCausalLM`	Phi-4-multimodal	`microsoft/Phi-4-multimodal-instruct`	L + I + A
`Qwen2ForCausalLM`	QwQ, Qwen2	`Qwen/Qwen2-7B-Instruct`	L
`Qwen2ForProcessRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-PRM-7B`	L
`Qwen2ForRewardModel`	Qwen2-based	`Qwen/Qwen2.5-Math-RM-72B`	L
`Qwen2VLForConditionalGeneration`	Qwen2-VL	`Qwen/Qwen2-VL-7B-Instruct`	L + I + V
`Qwen2_5_VLForConditionalGeneration`	Qwen2.5-VL	`Qwen/Qwen2.5-VL-7B-Instruct`	L + I + V
`Qwen3ForCausalLM`	Qwen3	`Qwen/Qwen3-8B`	L
`Qwen3MoeForCausalLM`	Qwen3MoE	`Qwen/Qwen3-30B-A3B`	L

Note:

L: Language
I: Image
V: Video
A: Audio

Models (TensorRT Backend)#

LLM Models#

Multi-Modal Models[3]#

Hardware#

The following table shows the supported hardware for TensorRT-LLM.

If a GPU architecture is not listed, the TensorRT-LLM team does not develop or test the software on the architecture and support is limited to community support.In addition, older architectures can have limitations for newer software releases.

	Hardware Compatibility
Operating System	TensorRT-LLM requires Linux x86_64 or Linux aarch64.
GPU Model Architectures	NVIDIA GB200 NVL72 NVIDIA Blackwell Architecture NVIDIA Grace Hopper Superchip NVIDIA Hopper Architecture NVIDIA Ada Lovelace Architecture NVIDIA Ampere Architecture

Software#

The following table shows the supported software for TensorRT-LLM.

	Software Compatibility
Container	25.06
TensorRT	10.11
Precision	Hopper (SM90) - FP32, FP16, BF16, FP8, INT8, INT4 Ada Lovelace (SM89) - FP32, FP16, BF16, FP8, INT8, INT4 Ampere (SM80, SM86) - FP32, FP16, BF16, INT8, INT4[5]

Note

Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer toNumerical Precision andexamples folder for additional information.

[1]

Encoder-Decoder provides general encoder-decoder functionality that supports many encoder-decoder models such as T5 family, BART family, Whisper family, NMT family, and so on.

[2]

Replit Code is not supported with the transformers 4.45+.

[3]

Multi-modal provides general multi-modal functionality that supports many multi-modal architectures such as BLIP2 family, LLaVA family, and so on.

[4]

Only supports bfloat16 precision.

[5]

INT4 AWQ and GPTQ with FP8 activations require SM >= 89.

On this page

Movatterモバイル変換

Support Matrix#