LLM API Change Guide #

This guide explains how to modify and manage APIs in TensorRT LLM, focusing on the high-level LLM API.

Overview#

TensorRT LLM provides multiple API levels:

LLM API - The highest-level API (e.g., theLLM class)
PyExecutor API - The mid-level API (e.g., thePyExecutor class)

This guide focuses on the LLM API, which is the primary interface for most users.

API Types and Stability Guarantees#

TensorRT LLM classifies APIs into two categories:

1. Committed APIs#

Stable and guaranteed to remain consistent across releases
No breaking changes without major version updates
Schema stored in:tests/unittest/api_stability/references_committed/

2. Non-committed APIs#

Under active development and may change between releases
Marked with astatus field in the docstring:
- prototype - Early experimental stage
- beta - More stable but still subject to change
- deprecated - Scheduled for removal
Schema stored in:tests/unittest/api_stability/references/
SeeAPI status documentation for complete details

API Schema Management#

All API schemas are:

Stored as YAML files in the codebase
Protected by unit tests intests/unittest/api_stability/
Automatically validated to ensure consistency

API Change Principles#

1. Knob Naming#

Use Semantic Clarity

Argument names should describe what the argument represents, not how it is used internally.

✅Good:max_new_tokens (clear meaning)

❌Bad:num (ambiguous)

Reflect Argument Type and Granularity

Forboolean knobs, prefix with verbs likeenable_ and so on.
Examples:enable_cache,enable_flash_attention
Fornumerical threshold knobs, suffix with_limit,_size,_count,_len_ or_ratio
Examples:max_seq_len,prefill_batch_size

Avoid Redundant Prefixes

Example (inMoeConfig):

✅Good:backend

❌Bad:moe_backend (redundant since it’s already inMoeConfig)

Use Specific Names for Narrow Scenarios

When adding knobs for specific use cases, make the name convey the restriction clearly via a prefix. It’s acceptable to rename later when the knob becomes more generic or is moved into a dedicated config.

Example (argument to the LLM class):

✅Good:rope_scaling_factor → clearly indicates it’s for RoPE

❌Bad:scaling_factor → too generic and prone to misuse

2. Hierarchical Configuration#

Organize complex or hierarchical arguments intodedicated configuration dataclasses with intuitive and consistent naming.

Guidelines

Use theXxxConfig suffix consistently
Examples:ModelConfig,ParallelConfig,MoeConfig
Reflect conceptual hierarchy
The dataclass name should represent a coherent functional unit, not an arbitrary grouping
Avoid over-nesting
Use only one level of configuration hierarchy whenever possible (e.g.,LlmArgs→ParallelConfig) to balance readability and modularity

3. Prefer`LlmArgs` Over Environment Variables#

LlmArgs is the central place for all configuration knobs. It integrates with our infrastructure to ensure:

API Stability
- Protects committed (stable) APIs
- GitHub reviewer committee oversees API stability
API Status Registration
- Uncommitted (unstable) APIs must be marked as"prototype" or"beta"
- API statuses are displayed in the documentation
API Documentation
- Each knob uses aField with a description
- Automatically rendered in public documentation

Managing knobs inLlmArgs remainsscalable and maintainable thanks to our existing infrastructure and review processes.

Drawbacks of Environment Variables:

Dispersed across the codebase
Lack documentation and discoverability
Pose challenges for testing and validation

Guidelines for Adding Knobs:

✅ Add clear, descriptive documentation for each field
✅ It’s fine to add temporary knobs and refine them later
⚠️ Always mark temporary knobs as"prototype" if not stable yet
✅ Refactor prototype knobs as they mature, promote them to “beta” or “stable”.

Modifying LLM Constructor Arguments#

The LLM class accepts numerous configuration parameters for models, runtime, and other components. These are managed through a Pydantic dataclass calledLlmArgs.

Architecture#

The LLM’s__init__ method parameters map directly toLlmArgs fields
LlmArgs is an alias forTorchLlmArgs (defined intensorrt_llm/llmapi/llm_args.py)
All arguments are validated and type-checked through Pydantic

Adding a New Argument#

Follow these steps to add a new constructor argument:

1. Add the field to`TorchLlmArgs`#

garbage_collection_gen0_threshold:int=Field(default=20000,description=("Threshold for Python garbage collection of generation 0 objects. ""Lower values trigger more frequent garbage collection."),status="beta"# Required for non-committed arguments)

Field requirements:

Type annotation: Required for all fields
Default value: Recommended unless the field is mandatory
Description: Clear explanation of the parameter’s purpose
Status: Required for non-committed arguments (prototype,beta, etc.)

2. Update the API schema#

Add the field to the appropriate schema file:

Non-committed arguments:tests/unittest/api_stability/references/llm.yaml

garbage_collection_gen0_threshold:type:intdefault:20000status:beta# Must match the status in code

Committed arguments:tests/unittest/api_stability/references_committed/llm.yaml

garbage_collection_gen0_threshold:type:intdefault:20000# No status field for committed arguments

3. Run validation tests#

python-mpytesttests/unittest/api_stability/test_llm_api.py

Modifying LLM Class Methods#

Public methods in the LLM class constitute the API surface. All changes must be properly documented and tracked.

Implementation Details#

The actual implementation is in the_TorchLLM class (llm.py)
Public methods (not starting with_) are automatically exposed as APIs

Adding a New Method#

Follow these steps to add a new API method:

1. Implement the method in`_TorchLLM`#

For non-committed APIs, use the@set_api_status decorator:

@set_api_status("beta")defgenerate_with_streaming(self,prompts:List[str],**kwargs)->Iterator[GenerationOutput]:"""Generate text with streaming output.    Args:        prompts: Input prompts for generation        **kwargs: Additional generation parameters    Returns:        Iterator of generation outputs    """# Implementation herepass

For committed APIs, no decorator is needed:

defgenerate(self,prompts:List[str],**kwargs)->GenerationOutput:"""Generate text from prompts."""# Implementation herepass

2. Update the API schema#

Add the method to the appropriatellm.yaml file:

Non-committed API (tests/unittest/api_stability/references/llm.yaml):

generate_with_streaming:status:beta# Must match @set_api_statusparameters:-name:promptstype:List[str]-name:kwargstype:dictreturns:Iterator[GenerationOutput]

Committed API (tests/unittest/api_stability/references_committed/llm.yaml):

generate:parameters:-name:promptstype:List[str]-name:kwargstype:dictreturns:GenerationOutput

Modifying Existing Methods#

When modifying existing methods:

Non-breaking changes (adding optional parameters):
- Update the method signature
- Update the schema file
- No status change needed
Breaking changes (changing required parameters, return types):
- Only allowed for non-committed APIs
- Consider deprecation path for beta APIs
- Update documentation with migration guide

Best Practices#

Documentation: Always include comprehensive docstrings
Type hints: Use proper type annotations for all parameters and returns
Testing: Add unit tests for new methods
Examples: Provide usage examples in the docstring
Validation: Run API stability tests before submitting changes

Running Tests#

Validate your changes:

# Run API stability testspython-mpytesttests/unittest/api_stability/# Run specific test for LLM APIpython-mpytesttests/unittest/api_stability/test_llm_api.py-v

Common Workflows#

Promoting an API from Beta to Committed#

Remove the@set_api_status("beta") decorator from the method
Move the schema entry fromtests/unittest/api_stability/references/ totests/unittest/api_stability/references_committed/
Remove thestatus field from the schema
Update any documentation referring to the API’s beta status

Deprecating an API#

Add@set_api_status("deprecated") to the method
Update the schema withstatus:deprecated

Add deprecation warning in the method:

importwarningswarnings.warn("This method is deprecated and will be removed in v2.0. ""Use new_method() instead.",DeprecationWarning,stacklevel=2)

Document the migration path

On this page

Movatterモバイル変換

LLM API Change Guide#

Overview#

API Types and Stability Guarantees#

1. Committed APIs#

2. Non-committed APIs#

API Schema Management#

API Change Principles#

1. Knob Naming#

2. Hierarchical Configuration#

3. PreferLlmArgs Over Environment Variables#

Modifying LLM Constructor Arguments#

Architecture#

Adding a New Argument#

1. Add the field toTorchLlmArgs#

2. Update the API schema#

3. Run validation tests#

Modifying LLM Class Methods#

Implementation Details#

Adding a New Method#

1. Implement the method in_TorchLLM#

2. Update the API schema#

Modifying Existing Methods#

Best Practices#

Running Tests#

Common Workflows#

Promoting an API from Beta to Committed#

Deprecating an API#

LLM API Change Guide #

3. Prefer`LlmArgs` Over Environment Variables#

1. Add the field to`TorchLlmArgs`#

1. Implement the method in`_TorchLLM`#