PyTorch Backend#
Note
Note:This feature is currently in beta, and the related API is subjected to change in future versions.
To enhance the usability of the system and improve developer efficiency, TensorRT LLM launches a new backend based on PyTorch.
The PyTorch backend of TensorRT LLM is available in version 0.17 and later. You can try it via importingtensorrt_llm._torch.
Quick Start#
Here is a simple example to show how to usetensorrt_llm.LLM API with Llama model.
1fromtensorrt_llmimportLLM,SamplingParams 2 3 4defmain(): 5 6# Model could accept HF model name, a path to local HF model, 7# or TensorRT Model Optimizer's quantized checkpoints like nvidia/Llama-3.1-8B-Instruct-FP8 on HF. 8llm=LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0") 910# Sample prompts.11prompts=[12"Hello, my name is",13"The capital of France is",14"The future of AI is",15]1617# Create a sampling params.18sampling_params=SamplingParams(temperature=0.8,top_p=0.95)1920foroutputinllm.generate(prompts,sampling_params):21print(22f"Prompt:{output.prompt!r}, Generated text:{output.outputs[0].text!r}"23)2425# Got output like26# Prompt: 'Hello, my name is', Generated text: '\n\nJane Smith. I am a student pursuing my degree in Computer Science at [university]. I enjoy learning new things, especially technology and programming'27# Prompt: 'The president of the United States is', Generated text: 'likely to nominate a new Supreme Court justice to fill the seat vacated by the death of Antonin Scalia. The Senate should vote to confirm the'28# Prompt: 'The capital of France is', Generated text: 'Paris.'29# Prompt: 'The future of AI is', Generated text: 'an exciting time for us. We are constantly researching, developing, and improving our platform to create the most advanced and efficient model available. We are'303132if__name__=='__main__':33main()
Features#
Developer Guide#
Key Components#
Known Issues#
The PyTorch backend on SBSA is incompatible with bare metal environments like Ubuntu 24.04. Please use thePyTorch NGC Container for optimal support on SBSA platforms.
Prototype Features#
On this page