Support Matrix#
TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA GPUs. The following sections provide a list of supported GPU architectures as well as important features implemented in TensorRT-LLM.
Models (PyTorch Backend)#
Architecture | Model | HuggingFace Example | Modality |
|---|---|---|---|
| BERT-based |
| L |
| Nemotron |
| L |
| DeepSeek-V3 |
| L |
| EXAONE 4.0 |
| L |
| Gemma 3 |
| L |
| Gemma 3 |
| L + I |
| HyperCLOVAX-SEED-Vision |
| L + I |
| VILA |
| L + I + V |
| LLaVA-NeXT |
| L + I |
| Llama 3.1, Llama 3, Llama 2, LLaMA |
| L |
| Llama 4 |
| L + I |
| Bielik |
| L |
| Mistral |
| L |
| Mistral3 |
| L + I |
| Mixtral |
| L |
| Llama 3.2 |
| L |
| Nemotron-3, Nemotron-4, Minitron |
| L |
| NemotronNAS |
| L |
| Phi-4 |
| L |
| Phi-4-multimodal |
| L + I + A |
| QwQ, Qwen2 |
| L |
| Qwen2-based |
| L |
| Qwen2-based |
| L |
| Qwen2-VL |
| L + I + V |
| Qwen2.5-VL |
| L + I + V |
| Qwen3 |
| L |
| Qwen3MoE |
| L |
Note:
L: Language
I: Image
V: Video
A: Audio
Models (TensorRT Backend)#
LLM Models#
Multi-Modal Models[3]#
Hardware#
The following table shows the supported hardware for TensorRT-LLM.
If a GPU architecture is not listed, the TensorRT-LLM team does not develop or test the software on the architecture and support is limited to community support.In addition, older architectures can have limitations for newer software releases.
Hardware Compatibility | |
|---|---|
Operating System | TensorRT-LLM requires Linux x86_64 or Linux aarch64. |
GPU Model Architectures |
Software#
The following table shows the supported software for TensorRT-LLM.
Software Compatibility | |
|---|---|
Container | |
TensorRT | |
Precision |
|
Note
Support for FP8 and quantized data types (INT8 or INT4) is not implemented for all the models. Refer toNumerical Precision andexamples folder for additional information.
[1]
Encoder-Decoder provides general encoder-decoder functionality that supports many encoder-decoder models such as T5 family, BART family, Whisper family, NMT family, and so on.
[2]Replit Code is not supported with the transformers 4.45+.
[3]Multi-modal provides general multi-modal functionality that supports many multi-modal architectures such as BLIP2 family, LLaVA family, and so on.
[4]Only supports bfloat16 precision.
[5]INT4 AWQ and GPTQ with FP8 activations require SM >= 89.