Building from Source Code on Linux#

This document provides instructions for building TensorRT LLM from source code on Linux. Building from source is recommended for achieving optimal performance, enabling debugging capabilities, or when you need a differentGNU CXX11 ABI configuration than what is available in the pre-built TensorRT LLM wheel on PyPI. Note that the current pre-built TensorRT LLM wheel on PyPI is linked against PyTorch 2.7.0 and subsequent versions, which uses the new CXX11 ABI.

Prerequisites#

UseDocker to build and run TensorRT LLM. Instructions to install an environment to run Docker containers for the NVIDIA platform can be foundhere.

If you intend to build any TensortRT-LLM artifacts, such as any of the container images (note that there exist pre-builtdevelop andrelease container images in NGC), or the TensorRT LLM Python wheel, you first need to clone the TensorRT LLM repository:

# TensorRT LLM uses git-lfs, which needs to be installed in advance.apt-getupdate&&apt-get-yinstallgitgit-lfsgitlfsinstallgitclonehttps://github.com/NVIDIA/TensorRT-LLM.gitcdTensorRT-LLMgitsubmoduleupdate--init--recursivegitlfspull

Building a TensorRT LLM Docker Image#

There are two options to create a TensorRT LLM Docker image. The approximate disk space required to build the image is 63 GB.

Option 1: Build TensorRT LLM in One Step#

Tip

If you just want to run TensorRT LLM, you can insteaduse the pre-built TensorRT LLM Release container images.

TensorRT LLM contains a simple command to create a Docker image. Note that if you plan to develop on TensorRT LLM, we recommend usingOption 2: Build TensorRT LLM Step-By-Step.

make-Cdockerrelease_build

You can add theCUDA_ARCHS="<listofarchitecturesinCMakeformat>" optional argument to specify which architectures should be supported by TensorRT LLM. It restricts the supported GPU architectures but helps reduce compilation time:

# Restrict the compilation to Ada and Hopper architectures.make-Cdockerrelease_buildCUDA_ARCHS="89-real;90-real"

After the image is built, the Docker container can be run.

make-Cdockerrelease_run

Themake command supports theLOCAL_USER=1 argument to switch to the local user account instead ofroot inside the container. The examples of TensorRT LLM are installed in the/app/tensorrt_llm/examples directory.

Since TensorRT LLM has been built and installed, you can skip the remaining steps.

Option 2: Container for building TensorRT LLM Step-by-Step#

If you are looking for more flexibility, TensorRT LLM has commands to create and run a development container in which TensorRT LLM can be built.

Tip

As an alternative to building the container image following the instructions below,you can pull a pre-builtTensorRT LLM Develop container image from NGC (seehere for information on container tags).Follow the linked catalog entry to enter a new container based on the pre-built container image, with the TensorRT source repository mounted into it. You can then skip this section and continue straight tobuilding TensorRT LLM.

On systems with GNUmake

  1. Create a Docker image for development. The image will be tagged locally withtensorrt_llm/devel:latest.

    make-Cdockerbuild
  2. Run the container.

    make-Cdockerrun

    If you prefer to work with your own user account in that container, instead ofroot, add theLOCAL_USER=1 option.

    make-CdockerrunLOCAL_USER=1

If you wish to use enroot instead of docker, then you can build a sqsh file that has the identical environment as the development imagetensorrt_llm/devel:latest as follows.

  1. Allocate a compute node:

    salloc--nodes=1
  2. Create a sqsh file with essential TensorRT LLM dependencies installed

    # Using default sqsh filename (enroot/tensorrt_llm.devel.sqsh)make-Cenrootbuild_sqsh# Or specify a custom path (optional)make-Cenrootbuild_sqshSQSH_PATH=/path/to/dev_trtllm_image.sqsh
  3. Once this squash file is ready, you can follow the steps underBuild TensorRT LLMby launching an enroot sandbox fromdev_trtllm_image.sqsh. To do this, proceed as follows:

    exportSQSH_PATH=/path/to/dev_trtllm_image.sqsh# Start a pseudo terminal for interactive sessionmake-Cenrootrun_sqsh# Or, you could run commands directlymake-Cenrootrun_sqshRUN_CMD="python3 scripts/build_wheel.py"

On systems without GNUmake

  1. Create a Docker image for development.

    dockerbuild--pull\--targetdevel\--filedocker/Dockerfile.multi\--tagtensorrt_llm/devel:latest\.
  2. Run the container.

    dockerrun--rm-it\--ipc=host--ulimitmemlock=-1--ulimitstack=67108864--gpus=all\--volume${PWD}:/code/tensorrt_llm\--workdir/code/tensorrt_llm\tensorrt_llm/devel:latest

    Note: please make sure to set--ipc=host as a docker run argument to avoidBuserror(coredumped).

Once inside the container, follow the next steps to build TensorRT LLM from source.

Advanced topics#

For more information on building and running various TensorRT LLM container images,checkNVIDIA/TensorRT-LLM.

Build TensorRT LLM#

Option 1: Full Build with C++ Compilation#

The following command compiles the C++ code and packages the compiled libraries along with the Python files into a wheel. When developing C++ code, you need this full build command to apply your code changes.

# To build the TensorRT LLM code.python3./scripts/build_wheel.py

Once the wheel is built, install it by:

pipinstall./build/tensorrt_llm*.whl

Alternatively, you can use editable installation, which is convenient if you also develop Python code.

pipinstall-e.

By default,build_wheel.py enables incremental builds. To clean the builddirectory, add the--clean option:

python3./scripts/build_wheel.py--clean

It is possible to restrict the compilation of TensorRT LLM to specific CUDAarchitectures. For that purpose, thebuild_wheel.py script accepts asemicolon separated list of CUDA architecture as shown in the followingexample:

# Build TensorRT LLM for Ampere.python3./scripts/build_wheel.py--cuda_architectures"80-real;86-real"

To use the C++ benchmark scripts underbenchmark/cpp, for examplegptManagerBenchmark.cpp, add the--benchmarks option:

python3./scripts/build_wheel.py--benchmarks

Refer to theHardware section for a list of architectures.

Building the Python Bindings for the C++ Runtime#

The C++ Runtime can be exposed to Python via bindings. This feature can be turned on through the default build options.

python3./scripts/build_wheel.py

After installing, the resulting wheel as described above, the C++ Runtime bindings will be available inthetensorrt_llm.bindings package. Runninghelp on this package in a Python interpreter will provide on overview of therelevant classes. The associated unit tests should also be consulted for understanding the API.

This feature will not be enabled whenbuildingonlytheC++runtime.

Linking with the TensorRT LLM C++ Runtime#

Thebuild_wheel.py script will also compile the library containing the C++ runtime of TensorRT LLM. If Python support andtorch modules are not required, the script provides the option--cpp_only which restricts the build to the C++ runtime only.

python3./scripts/build_wheel.py--cuda_architectures"80-real;86-real"--cpp_only--clean

This is particularly useful for avoiding linking issues that may arise with older versions oftorch (prior to 2.7.0) due to theDual ABI support in GCC. The--clean option removes the build directory before starting a new build. By default, TensorRT LLM usescpp/build as the build directory, but you can specify a different location with the--build_dir option. For a complete list of available build options, runpython3./scripts/build_wheel.py--help.

The shared library can be found in the following location:

cpp/build/tensorrt_llm/libtensorrt_llm.so

In addition, link against the library containing the LLM plugins for TensorRT.

cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensorrt_llm.so

Supported C++ Header Files#

When using TensorRT LLM, you need to add thecpp andcpp/include directories to the project’s include paths. Only header files contained incpp/include are part of the supported API and may be directly included. Other headers contained undercpp should not be included directly since they might change in future versions.

Option 2: Python-Only Build without C++ Compilation#

If you only need to modify Python code, it is possible to package and install TensorRT LLM without compilation.

# Package TensorRT LLM wheel.TRTLLM_USE_PRECOMPILED=1pipwheel.--no-deps--wheel-dir./build# Install TensorRT LLM wheel.pipinstall./build/tensorrt_llm*.whl

Alternatively, you can use editable installation for convenience during Python development.

TRTLLM_USE_PRECOMPILED=1pipinstall-e.

SettingTRTLLM_USE_PRECOMPILED=1 enables downloading a prebuilt wheel of the version specified intensorrt_llm/version.py, extracting compiled libraries into your current directory, thus skipping C++ compilation. This version can be overridden by specifyingTRTLLM_USE_PRECOMPILED=x.y.z.

You can specify a custom URL or local path for downloading usingTRTLLM_PRECOMPILED_LOCATION. For example, to use version 0.16.0 from PyPI:

TRTLLM_PRECOMPILED_LOCATION=https://pypi.nvidia.com/tensorrt-llm/tensorrt_llm-0.16.0-cp312-cp312-linux_x86_64.whlpipinstall-e.

Known Limitations#

When usingTRTLLM_PRECOMPILED_LOCATION, ensure that your wheel is compiled based on the same version of C++ code as your current directory; any discrepancies may lead to compatibility issues.