Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4,710 Commits
.github		.github
cmake		cmake
core		core
cpp		cpp
docker		docker
docs		docs
docsrc		docsrc
examples		examples
notebooks		notebooks
packaging		packaging
py		py
tests		tests
third_party		third_party
toolchains		toolchains
tools		tools
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
.nspect-allowlist.toml		.nspect-allowlist.toml
.pre-commit-config.yaml		.pre-commit-config.yaml
.style.yapf		.style.yapf
BUILD.bazel		BUILD.bazel
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Config.cmake.in		Config.cmake.in
LICENSE		LICENSE
MODULE.bazel		MODULE.bazel
README.md		README.md
dev_dep_versions.yml		dev_dep_versions.yml
noxfile.py		noxfile.py
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
setup.py		setup.py
uv.lock		uv.lock
version.txt		version.txt
versions.py		versions.py

Repository files navigation

Torch-TensorRT

Easily achieve the best inference performance for any PyTorch model on the NVIDIA platform.

Torch-TensorRT brings the power of TensorRT to PyTorch. Accelerate inference latency by up to 5x compared to eager execution in just one line of code.

Installation

Stable versions of Torch-TensorRT are published on PyPI

pip install torch-tensorrt

Nightly versions of Torch-TensorRT are published on the PyTorch package index

pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu130

Torch-TensorRT is also distributed in the ready-to-runNVIDIA NGC PyTorch Container which has all dependencies with the proper versions and example notebooks included.

For more advanced installation methods, please seehere

Quickstart

Option 1: torch.compile

You can use Torch-TensorRT anywhere you usetorch.compile:

importtorchimporttorch_tensorrtmodel=MyModel().eval().cuda()# define your model herex=torch.randn((1,3,224,224)).cuda()# define what the inputs to the model will look likeoptimized_model=torch.compile(model,backend="tensorrt")optimized_model(x)# compiled on first runoptimized_model(x)# this will be fast!

Option 2: Export

If you want to optimize your model ahead-of-time and/or deploy in a C++ environment, Torch-TensorRT provides an export-style workflow that serializes an optimized module. This module can be deployed in PyTorch or with libtorch (i.e. without a Python dependency).

Step 1: Optimize + serialize

importtorchimporttorch_tensorrtmodel=MyModel().eval().cuda()# define your model hereinputs= [torch.randn((1,3,224,224)).cuda()]# define a list of representative inputs heretrt_gm=torch_tensorrt.compile(model,ir="dynamo",inputs=inputs)torch_tensorrt.save(trt_gm,"trt.ep",inputs=inputs)# PyTorch only supports Python runtime for an ExportedProgram. For C++ deployment, use a TorchScript filetorch_tensorrt.save(trt_gm,"trt.ts",output_format="torchscript",inputs=inputs)

Step 2: Deploy

Deployment in PyTorch:

importtorchimporttorch_tensorrtinputs= [torch.randn((1,3,224,224)).cuda()]# your inputs go here# You can run this in a new python session!model=torch.export.load("trt.ep").module()# model = torch_tensorrt.load("trt.ep").module() # this also worksmodel(*inputs)

Deployment in C++:

#include"torch/script.h"#include"torch_tensorrt/torch_tensorrt.h"auto trt_mod = torch::jit::load("trt.ts");auto input_tensor = [...];// fill this with your inputsauto results = trt_mod.forward({input_tensor});

Further resources

Platform Support

Platform	Support
Linux AMD64 / GPU	Supported
Linux SBSA / GPU	Supported
Windows / GPU	Supported (Dynamo only)
Linux Jetson / GPU	Source Compilation Supported on JetPack-4.4+
Linux Jetson / DLA	Source Compilation Supported on JetPack-4.4+
Linux ppc64le / GPU	Not supported

Note: ReferNVIDIA L4T PyTorch NGC container for PyTorch libraries on JetPack.

Dependencies

These are the following dependencies used to verify the testcases. Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.

Bazel 8.1.1
Libtorch 2.10.0.dev (latest nightly)
CUDA 13.0 (CUDA 12.6 on Jetson)
TensorRT 10.14.1.48 (TensorRT 10.3 on Jetson)

Deprecation Policy

Deprecation is used to inform developers that some APIs and tools are no longer recommended for use. Beginning with version 2.3, Torch-TensorRT has the following deprecation policy:

Deprecation notices are communicated in the Release Notes. Deprecated API functions will have a statement in the source documenting when they were deprecated. Deprecated methods and classes will issue deprecation warnings at runtime, if they are used. Torch-TensorRT provides a 6-month migration period after the deprecation. APIs and tools continue to work during the migration period. After the migration period ends, APIs and tools are removed in a manner consistent with semantic versioning.