API Reference #

Note

Since version 1.0, we have attached a status label toLLM,LlmArgs andTorchLlmArgs Classes.

stable - The item is stable and will keep consistent.
prototype - The item is a prototype and is subject to change.
beta - The item is in beta and approaching stability.
deprecated - The item is deprecated and will be removed in a future release.

classtensorrt_llm.llmapi.LLM(
model:str|Path,
tokenizer:str|Path|TokenizerBase|PreTrainedTokenizerBase|None=None,
tokenizer_mode:Literal['auto','slow']='auto',
skip_tokenizer_init:bool=False,
trust_remote_code:bool=False,
tensor_parallel_size:int=1,
dtype:str='auto',
revision:str|None=None,
tokenizer_revision:str|None=None,
**kwargs:Any,
)[source]#

Bases:_TorchLLM

LLM class is the main class for running a LLM model.

For more details about the arguments, please refer toTorchLlmArgs.

Parameters:

model (Union[str,pathlib.Path]) –stable The path to the model checkpoint or the model name from the Hugging Face Hub.
tokenizer (Union[str,pathlib.Path,transformers.tokenization_utils_base.PreTrainedTokenizerBase,tensorrt_llm.llmapi.tokenizer.TokenizerBase,NoneType]) –stable The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub. Defaults to None.
tokenizer_mode (Literal['auto','slow']) –stable The mode to initialize the tokenizer. Defaults to auto.
skip_tokenizer_init (bool) –stable Whether to skip the tokenizer initialization. Defaults to False.
trust_remote_code (bool) –stable Whether to trust the remote code. Defaults to False.
tensor_parallel_size (int) –stable The tensor parallel size. Defaults to 1.
dtype (str) –stable The data type to use for the model. Defaults to auto.
revision (Optional[str]) –stable The revision to use for the model. Defaults to None.
tokenizer_revision (Optional[str]) –stable The revision to use for the tokenizer. Defaults to None.
pipeline_parallel_size (int) –stable The pipeline parallel size. Defaults to 1.
context_parallel_size (int) –stable The context parallel size. Defaults to 1.
gpus_per_node (Optional[int]) –beta The number of GPUs per node. Defaults to None.
moe_cluster_parallel_size (Optional[int]) –beta The cluster parallel size for MoE models’s expert weights. Defaults to None.
moe_tensor_parallel_size (Optional[int]) –stable The tensor parallel size for MoE models’s expert weights. Defaults to None.
moe_expert_parallel_size (Optional[int]) –stable The expert parallel size for MoE models’s expert weights. Defaults to None.
enable_attention_dp (bool) –beta Enable attention data parallel. Defaults to False.
enable_lm_head_tp_in_adp (bool) –prototype Enable LM head TP in attention dp. Defaults to False.
pp_partition (Optional[List[int]]) –prototype Pipeline parallel partition, a list of each rank’s layer number. Defaults to None.
cp_config (Optional[dict]) –prototype Context parallel config. Defaults to None.
load_format (Union[str,tensorrt_llm.llmapi.llm_args.LoadFormat]) –stable How to load the model weights. By default, detect the weight type from the model checkpoint. Defaults to 0.
fail_fast_on_attention_window_too_large (bool) –prototype Fail fast when attention window is too large to fit even a single sequence in the KV cache. Defaults to False.
enable_lora (bool) –stable Enable LoRA. Defaults to False.
lora_config (Optional[tensorrt_llm.lora_helper.LoraConfig]) –stable LoRA configuration for the model. Defaults to None.
kv_cache_config (tensorrt_llm.llmapi.llm_args.KvCacheConfig) –stable KV cache config. Defaults to None.
enable_chunked_prefill (bool) –stable Enable chunked prefill. Defaults to False.
guided_decoding_backend (Optional[Literal['xgrammar','llguidance']]) –stable Guided decoding backend. llguidance is supported in PyTorch backend only. Defaults to None.
batched_logits_processor (Optional[tensorrt_llm.sampling_params.BatchedLogitsProcessor]) –stable Batched logits processor. Defaults to None.
iter_stats_max_iterations (Optional[int]) –prototype The maximum number of iterations for iter stats. Defaults to None.
request_stats_max_iterations (Optional[int]) –prototype The maximum number of iterations for request stats. Defaults to None.
peft_cache_config (Optional[tensorrt_llm.llmapi.llm_args.PeftCacheConfig]) –prototype PEFT cache config. Defaults to None.
scheduler_config (tensorrt_llm.llmapi.llm_args.SchedulerConfig) –prototype Scheduler config. Defaults to None.
cache_transceiver_config (Optional[tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig]) –prototype Cache transceiver config. Defaults to None.
sparse_attention_config (Union[tensorrt_llm.llmapi.llm_args.RocketSparseAttentionConfig,tensorrt_llm.llmapi.llm_args.DeepSeekSparseAttentionConfig,NoneType]) –prototype Sparse attention config. Defaults to None.
speculative_config (Union[tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig,tensorrt_llm.llmapi.llm_args.EagleDecodingConfig,tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig,tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig,tensorrt_llm.llmapi.llm_args.MTPDecodingConfig,tensorrt_llm.llmapi.llm_args.NGramDecodingConfig,tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig,tensorrt_llm.llmapi.llm_args.SaveHiddenStatesDecodingConfig,tensorrt_llm.llmapi.llm_args.AutoDecodingConfig,NoneType]) –stable Speculative decoding config. Defaults to None.
max_batch_size (Optional[int]) –stable The maximum batch size. Defaults to None.
max_input_len (Optional[int]) –stable The maximum input length. Defaults to None.
max_seq_len (Optional[int]) –stable The maximum sequence length. Defaults to None.
max_beam_width (Optional[int]) –stable The maximum beam width. Defaults to None.
max_num_tokens (Optional[int]) –stable The maximum number of tokens. Defaults to 8192.
gather_generation_logits (bool) –prototype Gather generation logits. Defaults to False.
num_postprocess_workers (int) –prototype The number of processes used for postprocessing the generated tokens, including detokenization. Defaults to 0.
postprocess_tokenizer_dir (Optional[str]) –prototype The path to the tokenizer directory for postprocessing. Defaults to None.
reasoning_parser (Optional[str]) –prototype The parser to separate reasoning content from output. Defaults to None.
otlp_traces_endpoint (Optional[str]) –prototype Target URL to which OpenTelemetry traces will be sent. Defaults to None.
return_perf_metrics (bool) –prototype Return perf metrics. Defaults to False.
orchestrator_type (Optional[Literal['rpc','ray']]) –prototype The orchestrator type to use. Defaults to None, which uses MPI. Defaults to None.
garbage_collection_gen0_threshold (int) –beta Threshold for Python garbage collection of generation 0 objects.Lower values trigger more frequent garbage collection. Defaults to 20000.
cuda_graph_config (Optional[tensorrt_llm.llmapi.llm_args.CudaGraphConfig]) –beta CUDA graph config.If true, use CUDA graphs for decoding. CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, and are enabled for batches that consist of decoding requestsonly (the reason is that it’s hard to capture a single graph with prefill requests since the input shapes are a function of the sequence lengths). Note that each CUDA graph can use up to 200 MB of extra memory. Defaults to None.
attention_dp_config (Optional[tensorrt_llm.llmapi.llm_args.AttentionDpConfig]) –beta Optimized load-balancing for the DP Attention scheduler. Defaults to None.
disable_overlap_scheduler (bool) –beta Disable the overlap scheduler. Defaults to False.
moe_config (tensorrt_llm.llmapi.llm_args.MoeConfig) –beta MoE config. Defaults to None.
attn_backend (str) –beta Attention backend to use. Defaults to TRTLLM.
sampler_type (Union[str,tensorrt_llm.llmapi.llm_args.SamplerType]) –beta The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested. Defaults to auto.
enable_iter_perf_stats (bool) –prototype Enable iteration performance statistics. Defaults to False.
enable_iter_req_stats (bool) –prototype If true, enables per request stats per iteration. Must also set enable_iter_perf_stats to true to get request stats. Defaults to False.
print_iter_log (bool) –beta Print iteration logs. Defaults to False.
perf_metrics_max_requests (int) –prototype The maximum number of requests for perf metrics. Must also set request_perf_metrics to true to get perf metrics. Defaults to 0.
batch_wait_timeout_ms (float) –prototype If greater than 0, the request queue might wait up to batch_wait_timeout_ms to receive max_batch_size requests, if fewer than max_batch_size requests are currently available. If 0, no waiting occurs. Defaults to 0.
batch_wait_timeout_iters (int) –prototype Maximum number of iterations the scheduler will wait to accumulate new coming requests for improved GPU utilization efficiency. If greater than 0, the scheduler will delay batch processing to gather more requests up to the specified iteration limit. If 0, disables timeout-iters-based batching delays. Defaults to 0.
batch_wait_max_tokens_ratio (float) –prototype Token accumulation threshold ratio for batch scheduling optimization. If greater than 0, the scheduler will accumulate requests locally until the total token count reaches batch_wait_max_tokens_ratio * max_num_tokens. This mechanism enhances GPU utilization efficiency by ensuring adequate batch sizes.If 0 disables token-based batching delays. Defaults to 0.
torch_compile_config (Optional[tensorrt_llm.llmapi.llm_args.TorchCompileConfig]) –prototype Torch compile config. Defaults to None.
enable_autotuner (bool) –prototype Enable autotuner for all tunable ops. This flag is for debugging purposes only, and the performance may significantly degrade if set to false. Defaults to True.
enable_layerwise_nvtx_marker (bool) –beta If true, enable layerwise nvtx marker. Defaults to False.
enable_min_latency (bool) –beta If true, enable min-latency mode. Currently only used for Llama4. Defaults to False.
stream_interval (int) –stable The iteration interval to create responses under the streaming mode. Set this to a larger value when the batch size is large, which helps reduce the streaming overhead. Defaults to 1.
force_dynamic_quantization (bool) –prototype If true, force dynamic quantization. Defaults to False. Defaults to False.
allreduce_strategy (Optional[Literal['AUTO','NCCL','UB','MINLATENCY','ONESHOT','TWOSHOT','LOWPRECISION','MNNVL','NCCL_SYMMETRIC']]) –beta Allreduce strategy to use. Defaults to AUTO.
checkpoint_loader (Optional[tensorrt_llm._torch.models.checkpoints.BaseCheckpointLoader]) –prototype The checkpoint loader to use for this LLM instance. You may use a custom checkpoint loader by subclassingBaseCheckpointLoader and providing an instance of the subclass here to load weights from a custom checkpoint format.If neither checkpoint_format nor checkpoint_loader are provided, checkpoint_format will be set to HF and the default HfCheckpointLoader will be used.If checkpoint_format and checkpoint_loader are both provided, checkpoint_loader will be ignored. Defaults to None.
checkpoint_format (Optional[str]) –prototype The format of the provided checkpoint. You may use a custom checkpoint format by subclassingBaseCheckpointLoader and registering it withregister_checkpoint_loader.If neither checkpoint_format nor checkpoint_loader are provided, checkpoint_format will be set to HF and the default HfCheckpointLoader will be used.If checkpoint_format and checkpoint_loader are both provided, checkpoint_loader will be ignored. Defaults to None.
kv_connector_config (Optional[tensorrt_llm.llmapi.llm_args.KvCacheConnectorConfig]) –prototype The config for KV cache connector. Defaults to None.
mm_encoder_only (bool) –prototype Only load/execute the vision encoder part of the full model. Defaults to False. Defaults to False.
ray_worker_extension_cls (Optional[str]) –prototype The full worker extension class name including module path.Allows users to extend the functions of the RayGPUWorker class. Defaults to None.
enable_sleep (bool) –prototype Enable LLM sleep feature. Sleep feature requires extra setup that may slowdown model loading.Only enable it if you intend to use this feature. Defaults to False.

tokenizer#

The tokenizer loaded by LLM instance, if any.

Type:: tensorrt_llm.llmapi.tokenizer.TokenizerBase, optional

llm_id#

The unique ID of the LLM instance.

Type:: str

__init__(
model:str|Path,
tokenizer:str|Path|TokenizerBase|PreTrainedTokenizerBase|None=None,
tokenizer_mode:Literal['auto','slow']='auto',
skip_tokenizer_init:bool=False,
trust_remote_code:bool=False,
tensor_parallel_size:int=1,
dtype:str='auto',
revision:str|None=None,
tokenizer_revision:str|None=None,
**kwargs:Any,
)→None[source]#

Generate output for the given prompts in the synchronous mode.Synchronous generation accepts either single prompt or batched prompts.

Parameters:

inputs (tensorrt_llm.inputs.data.PromptInputs,Sequence[tensorrt_llm.inputs.data.PromptInputs]) – The prompt text or token ids.It can be single prompt or batched prompts.
sampling_params (tensorrt_llm.sampling_params.SamplingParams,List[tensorrt_llm.sampling_params.SamplingParams],optional) – The sampling params for the generation. Defaults to None.A default one will be used if not provided.
use_tqdm (bool) – Whether to use tqdm to display the progress bar. Defaults to True.
lora_request (tensorrt_llm.executor.request.LoRARequest,Sequence[tensorrt_llm.executor.request.LoRARequest],optional) – LoRA request to use for generation, if any. Defaults to None.
prompt_adapter_request (tensorrt_llm.executor.request.PromptAdapterRequest,Sequence[tensorrt_llm.executor.request.PromptAdapterRequest],optional) – Prompt Adapter request to use for generation, if any. Defaults to None.
kv_cache_retention_config (tensorrt_llm.bindings.executor.KvCacheRetentionConfig,Sequence[tensorrt_llm.bindings.executor.KvCacheRetentionConfig],optional) – Configuration for the request’s retention in the KV Cache. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams,Sequence[tensorrt_llm.disaggregated_params.DisaggregatedParams],optional) – Disaggregated parameters. Defaults to None.
scheduling_params (tensorrt_llm.scheduling_params.SchedulingParams,List[tensorrt_llm.scheduling_params.SchedulingParams],optional) – Scheduling parameters. Defaults to None.
cache_salt (str,Sequence[str],optional) – If specified, KV cache will be salted with the provided string to limit the kv cache reuse to the requests with the same string. Defaults to None.

Returns:

The output data of the completion request to the LLM.

Return type:

Union[tensorrt_llm.llmapi.RequestOutput, List[tensorrt_llm.llmapi.RequestOutput]]

Generate output for the given prompt in the asynchronous mode.Asynchronous generation accepts single prompt only.

Parameters:

inputs (tensorrt_llm.inputs.data.PromptInputs) – The prompt text or token ids; it must be single prompt.
sampling_params (tensorrt_llm.sampling_params.SamplingParams,optional) – The sampling params for the generation. Defaults to None.A default one will be used if not provided.
lora_request (tensorrt_llm.executor.request.LoRARequest,optional) – LoRA request to use for generation, if any. Defaults to None.
prompt_adapter_request (tensorrt_llm.executor.request.PromptAdapterRequest,optional) – Prompt Adapter request to use for generation, if any. Defaults to None.
streaming (bool) – Whether to use the streaming mode for the generation. Defaults to False.
kv_cache_retention_config (tensorrt_llm.bindings.executor.KvCacheRetentionConfig,optional) – Configuration for the request’s retention in the KV Cache. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams,optional) – Disaggregated parameters. Defaults to None.
trace_headers (Mapping[str,str],optional) – Trace headers. Defaults to None.
scheduling_params (tensorrt_llm.scheduling_params.SchedulingParams,optional) – Scheduling parameters. Defaults to None.
cache_salt (str,optional) – If specified, KV cache will be salted with the provided string to limit the kv cache reuse to the requests with the same string. Defaults to None.

Returns:

The output data of the completion request to the LLM.

Return type:

tensorrt_llm.llmapi.RequestOutput

get_kv_cache_events( timeout:float|None=2, )→List[dict]#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

setevent_buffer_max_size to a positive integer in theKvCacheConfig.
setenable_block_reuse to True in theKvCacheConfig.

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving events from queue. Defaults to 2.
Returns:: A list of runtime events as dict.
Return type:: List[dict]

get_kv_cache_events_async( timeout:float|None=2, )→IterationResult#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

setevent_buffer_max_size to a positive integer in theKvCacheConfig.
setenable_block_reuse to True in theKvCacheConfig.

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving events from queue. . Defaults to 2.
Returns:: An async iterable object containing runtime events.
Return type:: tensorrt_llm.executor.result.IterationResult

get_stats(timeout:float|None=2)→List[dict]#

beta Get iteration statistics from the runtime.To collect statistics, call this function after prompts have been submitted with LLM().generate().

Parameters:

timeout (float,optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.

Returns:

A list of runtime stats as dict.: e.g., [‘{“cpuMemUsage”: …, “iter”: 0, …}’, ‘{“cpuMemUsage”: …, “iter”: 1, …}’]

Return type:

List[dict]

get_stats_async( timeout:float|None=2, )→IterationResult#

beta Get iteration statistics from the runtime.To collect statistics, you can call this function in an async coroutine or the /metrics endpoint (if you’re using trtllm-serve)after prompts have been submitted.

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.
Returns:: An async iterable object containing runtime stats.
Return type:: tensorrt_llm.executor.result.IterationResult

shutdown()→None#: beta None

propertyllm_id:str#: beta None

propertytokenizer:TokenizerBase|None#

classtensorrt_llm.llmapi.MultimodalEncoder(
model:str|Path,
trust_remote_code:bool=False,
tensor_parallel_size:int=1,
dtype:Literal['auto','float16','float32','bfloat16']='auto',
**kwargs:Any,
)[source]#

Bases:_TorchLLM

MultimodalEncoder class is the main class for running a multimodal encoder model using PyTorch backend.

__init__(
model:str|Path,
trust_remote_code:bool=False,
tensor_parallel_size:int=1,
dtype:Literal['auto','float16','float32','bfloat16']='auto',
**kwargs:Any,
)→None[source]#

Generate output for the given prompts in the synchronous mode.Synchronous generation accepts either single prompt or batched prompts.

Parameters:: inputs (tensorrt_llm.inputs.data.PromptInputs,Sequence[tensorrt_llm.inputs.data.PromptInputs]) – The prompt text or token ids.It can be single prompt or batched prompts.
Returns:: The output data of the completion request to the LLM.
Return type:: Union[tensorrt_llm.llmapi.RequestOutput, List[tensorrt_llm.llmapi.RequestOutput]]

generate_async( inputs:str|List[int]|TextPrompt|TokensPrompt, sampling_params:SamplingParams|None=None, )→RequestOutput[source]#

Generate output for the given multimodal request in the asynchronous mode.Asynchronous generation accepts single multimodal request only.

Returns:: Future that resolves to tensorrt_llm.llmapi.RequestOutput containing mm_embeddings

get_kv_cache_events( timeout:float|None=2, )→List[dict]#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

setevent_buffer_max_size to a positive integer in theKvCacheConfig.
setenable_block_reuse to True in theKvCacheConfig.

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving events from queue. Defaults to 2.
Returns:: A list of runtime events as dict.
Return type:: List[dict]

get_kv_cache_events_async( timeout:float|None=2, )→IterationResult#

beta Get iteration KV events from the runtime.

KV events are used to track changes and operations within the KV Cache. Types of events:

KVCacheCreatedData: Indicates the creation of cache blocks.
KVCacheStoredData: Represents a sequence of stored blocks.
KVCacheRemovedData: Contains the hashes of blocks that are being removed from the cache.
KVCacheUpdatedData: Captures updates to existing cache blocks.

To enable KV events:

setevent_buffer_max_size to a positive integer in theKvCacheConfig.
setenable_block_reuse to True in theKvCacheConfig.

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving events from queue. . Defaults to 2.
Returns:: An async iterable object containing runtime events.
Return type:: tensorrt_llm.executor.result.IterationResult

get_stats( timeout:float|None=2, )→List[dict]#

beta Get iteration statistics from the runtime.To collect statistics, call this function after prompts have been submitted with LLM().generate().

Parameters:

timeout (float,optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.

Returns:

A list of runtime stats as dict.: e.g., [‘{“cpuMemUsage”: …, “iter”: 0, …}’, ‘{“cpuMemUsage”: …, “iter”: 1, …}’]

Return type:

List[dict]

get_stats_async( timeout:float|None=2, )→IterationResult#

Parameters:: timeout (float,optional) – Max wait time in seconds when retrieving stats from queue. Defaults to 2.
Returns:: An async iterable object containing runtime stats.
Return type:: tensorrt_llm.executor.result.IterationResult

shutdown()→None#: beta None

propertyllm_id:str#: beta None

propertytokenizer:TokenizerBase|None#

classtensorrt_llm.llmapi.CompletionOutput( index:int, text:str='', token_ids:~typing.List[int]|None=<factory>, cumulative_logprob:float|None=None, logprobs:list[dict[int, ~tensorrt_llm.executor.result.Logprob]]|~typing.List[float]|None=<factory>, prompt_logprobs:list[dict[int, ~tensorrt_llm.executor.result.Logprob]]|None=<factory>, finish_reason:~typing.Literal['stop', 'length', 'timeout', 'cancelled']|None=None, stop_reason:int|str|None=None, generation_logits:~torch.Tensor|None=None, additional_context_outputs:~typing.Dict[str, ~torch.Tensor]|None=None, additional_generation_outputs:~typing.Dict[str, ~torch.Tensor]|None=None, disaggregated_params:~tensorrt_llm.disaggregated_params.DisaggregatedParams|None=None, request_perf_metrics:~tensorrt_llm.bindings.executor.RequestPerfMetrics|None=None, _postprocess_result:~typing.Any=None, )[source]#

Bases:object

The output data of one completion output of a request.

Parameters:

index (int) – The index of the output in the request.
text (str) – The generated output text. Defaults to “”.
token_ids (List[int],optional) – The token ids of the generated output text. Defaults to [].
cumulative_logprob (float,optional) – The cumulative log probability of the generated output text. Defaults to None.
logprobs (TokenLogprobs |List[float],optional) – The log probabilities of the top probability words at each position if the logprobs are requested. Defaults to None.
prompt_logprobs (TokenLogprobs,optional) – The log probabilities per prompt token. Defaults to None.
finish_reason (Literal['stop','length','timeout','cancelled'],optional) – The reason why the sequence is finished. Defaults to None.
stop_reason (int,str,optional) – The stop string or token id that caused the completion to stop, None if the completion finished for some other reason. Defaults to None.
generation_logits (torch.Tensor,optional) – The logits on the generated output token ids. Defaults to None.
additional_context_outputs (Dict[str,torch.Tensor],optional) – The additional context outputs. Defaults to None.
additional_generation_outputs (Dict[str,torch.Tensor],optional) – The additional generation outputs. Defaults to None.
disaggregated_params (tensorrt_llm.disaggregated_params.DisaggregatedParams,optional) – Parameters needed for disaggregated serving. Includes the type of request, the first generated tokens, the context request id and the any additional state needing to be transferred from context and generation instances. Defaults to None.
request_perf_metrics (tensorrt_llm.bindings.executor.RequestPerfMetrics,optional) – Performance metrics for the request. Defaults to None.

length#

The number of generated tokens.

Type:: int

token_ids_diff#

Newly generated token ids.

Type:: List[int]

logprobs_diff#

Logprobs of newly generated tokens.

Type:: TokenLogprobs | List[float]

text_diff#

Newly generated tokens.

Type:: str

__init__( index:int, text:str='', token_ids:~typing.List[int]|None=<factory>, cumulative_logprob:float|None=None, logprobs:list[dict[int, ~tensorrt_llm.executor.result.Logprob]]|~typing.List[float]|None=<factory>, prompt_logprobs:list[dict[int, ~tensorrt_llm.executor.result.Logprob]]|None=<factory>, finish_reason:~typing.Literal['stop', 'length', 'timeout', 'cancelled']|None=None, stop_reason:int|str|None=None, generation_logits:~torch.Tensor|None=None, additional_context_outputs:~typing.Dict[str, ~torch.Tensor]|None=None, additional_generation_outputs:~typing.Dict[str, ~torch.Tensor]|None=None, disaggregated_params:~tensorrt_llm.disaggregated_params.DisaggregatedParams|None=None, request_perf_metrics:~tensorrt_llm.bindings.executor.RequestPerfMetrics|None=None, _postprocess_result:~typing.Any=None, )→None#

additional_context_outputs:Dict[str,Tensor]|None#

additional_generation_outputs:Dict[str,Tensor]|None#

cumulative_logprob:float|None#

disaggregated_params:DisaggregatedParams|None#

finish_reason:Literal['stop','length','timeout','cancelled']|None#

generation_logits:Tensor|None#

index:int#

propertylength:int#

logprobs:list[dict[int,Logprob]]|List[float]|None#

propertylogprobs_diff:list[dict[int,Logprob]]|List[float]#

prompt_logprobs:list[dict[int,Logprob]]|None#

request_perf_metrics:RequestPerfMetrics|None#

stop_reason:int|str|None#

text:str#

propertytext_diff:str#

token_ids:List[int]|None#

propertytoken_ids_diff:List[int]#

classtensorrt_llm.llmapi.RequestOutput[source]#

Bases:DetokenizedGenerationResultBase,GenerationResult

The output data of a completion request to the LLM.

request_id#

The unique ID of the request.

Type:: int

prompt#

The prompt string of the request.

Type:: str, optional

prompt_token_ids#

The token ids of the prompt.

Type:: List[int]

outputs#

The output sequences of the request.

Type:: List[CompletionOutput]

context_logits#

The logits on the prompt token ids.

Type:: torch.Tensor, optional

mm_embedding_handle#

The multimodal embedding handle of the request.

Type:: Dict[str, Any], optional

finished#

Whether the whole request is finished.

Type:: bool

classPostprocWorker( pull_pipe_addr:tuple[str,bytes|None], push_pipe_addr:tuple[str,bytes|None], tokenizer_dir:str, record_creator:Callable[[Input,TransformersTokenizer],Any], )#

Bases:object

The worker to postprocess the responses from the executor’s await_response.

classInput( rsp:ForwardRef('tllm.Response')|ForwardRef('ResponseWrapper'), sampling_params:tensorrt_llm.sampling_params.SamplingParams|None=None, postproc_params:tensorrt_llm.executor.postproc_worker.PostprocParams|None=None, streaming:bool|None=None, )#

Bases:object

__init__( rsp:tllm.Response|ResponseWrapper, sampling_params:SamplingParams|None=None, postproc_params:PostprocParams|None=None, streaming:bool|None=None, )→None#

postproc_params:PostprocParams|None=None#

rsp:tllm.Response|ResponseWrapper#

sampling_params:SamplingParams|None=None#

streaming:bool|None=None#

classOutput( client_id, res, is_final, error, metrics, request_perf_metrics, disaggregated_params, )#

Bases:NamedTuple

count(value,/)#: Return number of occurrences of value.

index( value, start=0, stop=9223372036854775807, /, )#

Return first index of value.

Raises ValueError if the value is not present.

client_id:int#: Alias for field number 0

disaggregated_params:Any#: Alias for field number 6

error:str#: Alias for field number 3

is_final:bool#: Alias for field number 2

metrics:dict[str,float]|None#: Alias for field number 4

request_perf_metrics:Any#: Alias for field number 5

res:Any#: Alias for field number 1

__init__( pull_pipe_addr:tuple[str,bytes|None], push_pipe_addr:tuple[str,bytes|None], tokenizer_dir:str, record_creator:Callable[[Input,TransformersTokenizer],Any], )#

Parameters:

pull_pipe_addr (tuple[str,Optional[bytes]]) – The address and HMAC key of the input IPC.
push_pipe_addr (tuple[str,Optional[bytes]]) – The address and HMAC key of the output IPC.
tokenizer_dir (str) – The directory to load tokenizer.
record_creator (Callable[["ResponsePostprocessWorker.Input"],Any]) – A creator for creating a record for a request.
result_handler (Optional[Callable[[GenerationResultBase],Any]]) – A callback handles the final result.

staticdefault_record_creator( inp:PostprocWorker.Input, tokenizer:TransformersTokenizer, )→DetokenizedGenerationResultBase#

start()#: Start the workflow in the current thread.

__init__()→None[source]#

abort()→None#: Abort the generation request.

aborted()→bool#

Return whether the generation request is aborted.

Returns:: whether the generation request is aborted.
Return type:: bool

asyncaresult()→GenerationResult#

Wait for the completion of the request, and return the result.

Returns:: generation result.
Return type:: tensorrt_llm.executor.result.GenerationResult

clear_logprob_params()→None#

do_tracing( output:CompletionOutput, req_perf_metrics_dict:dict[str,float]|None=None, )→None#

Perform distributed tracing for the generation request.

Parameters:

output (CompletionOutput) – The output of the generation result.
req_perf_metrics_dict (Optional[dict[str,float]]) – Request performance metrics. Defaults to None.

record_stats( output:CompletionOutput, stats:dict[str,float]|None=None, )→None#

Record the stats of the generation result.

Parameters:

output (CompletionOutput) – The output of the generation result.
stats (Optional[dict[str,float]]) – The stats of the generation result. Defaults to None.

result( timeout:float|None=None, )→GenerationResult#

Wait for the completion of the request, and return the result.

Parameters:: timeout (float,optional) – Timeout. Defaults to None.
Returns:: generation result.
Return type:: tensorrt_llm.executor.result.GenerationResult

propertycontext_logits:Tensor|None#

propertyfinished:bool#

propertymm_embedding_handle:Dict[str,Any]|None#

propertyoutputs:List[CompletionOutput]#

propertyprompt:str|None#

propertyprompt_token_ids:List[int]#

propertyrequest_id:int#

Bases:object

Guided decoding parameters for text generation. Only one of the fields could be effective.

Parameters:

json (str,pydantic.main.BaseModel,dict,optional) – The generated text is amenable to json format with additional user-specified restrictions, namely schema. Defaults to None.
regex (str,optional) – The generated text is amenable to the user-specified regular expression. Defaults to None.
grammar (str,optional) – The generated text is amenable to the user-specified extended Backus-Naur form (EBNF) grammar. Defaults to None.
json_object (bool) – If True, the generated text is amenable to json format. Defaults to False.
structural_tag (str,optional) – The generated text is amenable to the user-specified structural tag. Structural tag is supported by xgrammar backend only. Defaults to None.

__init__( *, json:str|BaseModel|dict|None=None, regex:str|None=None, grammar:str|None=None, json_object:bool=False, structural_tag:str|None=None, )→None#

grammar:str|None#

json:str|BaseModel|dict|None#

json_object:bool#

regex:str|None#

structural_tag:str|None#

classtensorrt_llm.llmapi.SamplingParams( *, end_id:int|None=None, pad_id:int|None=None, max_tokens:int=32, bad:str|List[str]|None=None, bad_token_ids:List[int]|None=None, stop:str|List[str]|None=None, stop_token_ids:List[int]|None=None, include_stop_str_in_output:bool=False, embedding_bias:Tensor|None=None, logits_processor:LogitsProcessor|List[LogitsProcessor]|None=None, apply_batched_logits_processor:bool=False, n:int=1, best_of:int|None=None, use_beam_search:bool=False, top_k:int|None=None, top_p:float|None=None, top_p_min:float|None=None, top_p_reset_ids:int|None=None, top_p_decay:float|None=None, seed:int|None=None, temperature:float|None=None, min_tokens:int|None=None, beam_search_diversity_rate:float|None=None, repetition_penalty:float|None=None, presence_penalty:float|None=None, frequency_penalty:float|None=None, prompt_ignore_length:int|None=None, length_penalty:float|None=None, early_stopping:int|None=None, no_repeat_ngram_size:int|None=None, min_p:float|None=None, beam_width_array:List[int]|None=None, logprobs:int|None=None, prompt_logprobs:int|None=None, return_context_logits:bool=False, return_generation_logits:bool=False, exclude_input_from_output:bool=True, return_encoder_output:bool=False, return_perf_metrics:bool=False, additional_model_outputs:List[str]|None=None, _context_logits_auto_enabled:bool=False, _generation_logits_auto_enabled:bool=False, _return_log_probs:bool=False, lookahead_config:LookaheadDecodingConfig|None=None, guided_decoding:GuidedDecodingParams|None=None, ignore_eos:bool=False, detokenize:bool=True, add_special_tokens:bool=True, truncate_prompt_tokens:int|None=None, skip_special_tokens:bool=True, spaces_between_special_tokens:bool=True, )[source]#

Bases:object

Sampling parameters for text generation.

Usage Examples:

use_beam_search is False:
best_of is None: (top-p/top-k) sampling n responses and return n generations
best_of is not None: (top-p/top-k) sampling best_of responses and return n generations (best_of >= n must hold)
use_beam_search is True:
best_of is None: beam search with beam width of n, return n generations
best_of is not None: beam search with beam width of best_of, return n generations (best_of >= n must hold)

Parameters:

end_id (int,optional) – The end token id. Defaults to None.
pad_id (int,optional) – The pad token id. Defaults to None.
max_tokens (int) – The maximum number of tokens to generate. Defaults to 32.
bad (str,List[str],optional) – A string or a list of strings that redirect the generation when they are generated, so that the bad strings are excluded from the returned output. Defaults to None.
bad_token_ids (List[int],optional) – A list of token ids that redirect the generation when they are generated, so that the bad ids are excluded from the returned output. Defaults to None.
stop (str,List[str],optional) – A string or a list of strings that stop the generation when they are generated. The returned output will not contain the stop strings unless include_stop_str_in_output is True. Defaults to None.
stop_token_ids (List[int],optional) – A list of token ids that stop the generation when they are generated. Defaults to None.
include_stop_str_in_output (bool) – Whether to include the stop strings in output text. Defaults to False.
embedding_bias (torch.Tensor,optional) – The embedding bias tensor. Expected type is kFP32 and shape is [vocab_size]. Defaults to None.
logits_processor (tensorrt_llm.sampling_params.LogitsProcessor,List[tensorrt_llm.sampling_params.LogitsProcessor],optional) – The logits postprocessor callback(s). Defaults to None.If a list, each processor is applied in order during generation (supported in PyTorch backend only).
apply_batched_logits_processor (bool) – Whether to apply batched logits postprocessor callback. Defaults to False.The BatchedLogitsProcessor class is recommended for callback creation. The callback must be provided when initializing LLM.
n (int) – Number of sequences to generate. Defaults to 1.
best_of (int,optional) – Number of sequences to consider for best output. Defaults to None.
use_beam_search (bool) – Whether to use beam search. Defaults to False.
top_k (int,optional) – Controls number of logits to sample from. Can assume non-negative values, where 0 means ‘all logits’. Defaults to None.The value None is treated as “not specified” in the following.If neither temperature, top_p, nor top_k are specified, sampling is greedy.If temperature > 0 and/or top_p < 1 are specified, sampling will proceed accordingly and top_k will default to top_k = 0.Setting top_k = 1 results in greedy sampling.
top_p (float,optional) – Controls the top-P probability to sample from. Can have values between 0 and 1. Defaults to None.The value None is treated as “not specified” in the following.If neither temperature, top_p, nor top_k are specified, sampling is greedy.If temperature > 0 and/or top_k > 1 are specified, sampling will proceed accordingly and top_p will default to top_p = 1.Setting top_p = 0 should result in greedy sampling, but is currently disallowed in the backend.
top_p_min (float,optional) – Controls decay in the top-P algorithm. topPMin is lower-bound. None means using C++ runtime default 1.e-6. Defaults to None.
top_p_reset_ids (int,optional) – Controls decay in the top-P algorithm. Indicates where to reset the decay. None means using C++ runtime default 1. Defaults to None.
top_p_decay (float,optional) – Controls decay in the top-P algorithm. The decay value. None means using C++ runtime default 1.f. Defaults to None.
seed (int,optional) – Controls the random seed used by the random number generator in sampling. None means using C++ runtime default 0. Defaults to None.
temperature (float,optional) – Controls the modulation of logits when sampling new tokens. It can have values >= 0.f. Defaults to None.The value None is treated as “not specified” in the following.If neither temperature, top_p, nor top_k are specified, sampling is greedy.If top_p < 1 and/or top_k > 1 are specified, sampling will proceed accordingly and temperature will default to temperature = 1.Setting temperature = 0 results in greedy sampling.
min_tokens (int,optional) – Lower bound on the number of tokens to generate. Values < 1 have no effect. None means using C++ runtime default 1. Defaults to None.
beam_search_diversity_rate (float,optional) – Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
repetition_penalty (float,optional) – Used to penalize tokens based on how often they appear in the sequence. It can have any value > 0.f. Values < 1.f encourages repetition, values > 1.f discourages it. None means using C++ runtime default 1.f. Defaults to None.
presence_penalty (float,optional) – Used to penalize tokens already present in the sequence (irrespective of the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
frequency_penalty (float,optional) – Used to penalize tokens already present in the sequence (dependent on the number of appearances). It can have any values. Values < 0.f encourage repetition, values > 0.f discourage it. None means using C++ runtime default 0.f. Defaults to None.
prompt_ignore_length (int,optional) – Controls how many tokens to ignore from the prompt for presence and frequency penalties. Values <= 0 have no effect. Values > input (prompt) length will be clamped. None means using C++ runtime default 0. Defaults to None.
length_penalty (float,optional) – Controls how to penalize longer sequences in beam search. None means using C++ runtime default 0.f. Defaults to None.
early_stopping (int,optional) – Controls whether the generation process finishes once beamWidth sentences are generated (ends with end_token). None means using C++ runtime default 1. Defaults to None.
no_repeat_ngram_size (int,optional) – Controls how many repeat ngram size are acceptable. None means using C++ runtime default 1 << 30. Defaults to None.
min_p (float,optional) – scale the most likely token to determine the minimum token probability. None means using C++ runtime default 0.0. Defaults to None.
beam_width_array (List[int],optional) – The array of beam width using in Variable-Beam-Width-Search. Defaults to None.
logprobs (int,optional) – Number of log probabilities to return per output token. Defaults to None.
prompt_logprobs (int,optional) – Number of log probabilities to return per prompt token. Defaults to None.
return_context_logits (bool) – Controls if Result should contain the context logits. Defaults to False.
return_generation_logits (bool) – Controls if Result should contain the generation logits. Defaults to False.
exclude_input_from_output (bool) – Controls if output tokens in Result should include the input tokens. Defaults to True.
return_encoder_output (bool) – Controls if Result should contain encoder output hidden states (for encoder-only and encoder-decoder models). Defaults to False.
return_perf_metrics (bool) – Controls if Result should contain the performance metrics for this request. Defaults to False.
additional_model_outputs (List[str],optional) – The additional outputs to gather from the model. Defaults to None.
lookahead_config (tensorrt_llm.bindings.executor.LookaheadDecodingConfig ,optional) – Lookahead decoding config. Defaults to None.
guided_decoding (tensorrt_llm.sampling_params.GuidedDecodingParams,optional) – Guided decoding params. Defaults to None.
ignore_eos (bool) – Whether to ignore the EOS token and continue generating tokens after the EOS token is generated. Defaults to False.
detokenize (bool) – Whether to detokenize the output. Defaults to True.
add_special_tokens (bool) – Whether to add special tokens to the prompt. Defaults to True.
truncate_prompt_tokens (int,optional) – If set to an integer k, will use only the last k tokens from the prompt (i.e., left truncation). Defaults to None.
skip_special_tokens (bool) – Whether to skip special tokens in the output. Defaults to True.
spaces_between_special_tokens (bool) – Whether to add spaces between special tokens in the output. Defaults to True.

__init__( *, end_id:int|None=None, pad_id:int|None=None, max_tokens:int=32, bad:str|List[str]|None=None, bad_token_ids:List[int]|None=None, stop:str|List[str]|None=None, stop_token_ids:List[int]|None=None, include_stop_str_in_output:bool=False, embedding_bias:Tensor|None=None, logits_processor:LogitsProcessor|List[LogitsProcessor]|None=None, apply_batched_logits_processor:bool=False, n:int=1, best_of:int|None=None, use_beam_search:bool=False, top_k:int|None=None, top_p:float|None=None, top_p_min:float|None=None, top_p_reset_ids:int|None=None, top_p_decay:float|None=None, seed:int|None=None, temperature:float|None=None, min_tokens:int|None=None, beam_search_diversity_rate:float|None=None, repetition_penalty:float|None=None, presence_penalty:float|None=None, frequency_penalty:float|None=None, prompt_ignore_length:int|None=None, length_penalty:float|None=None, early_stopping:int|None=None, no_repeat_ngram_size:int|None=None, min_p:float|None=None, beam_width_array:List[int]|None=None, logprobs:int|None=None, prompt_logprobs:int|None=None, return_context_logits:bool=False, return_generation_logits:bool=False, exclude_input_from_output:bool=True, return_encoder_output:bool=False, return_perf_metrics:bool=False, additional_model_outputs:List[str]|None=None, _context_logits_auto_enabled:bool=False, _generation_logits_auto_enabled:bool=False, _return_log_probs:bool=False, lookahead_config:LookaheadDecodingConfig|None=None, guided_decoding:GuidedDecodingParams|None=None, ignore_eos:bool=False, detokenize:bool=True, add_special_tokens:bool=True, truncate_prompt_tokens:int|None=None, skip_special_tokens:bool=True, spaces_between_special_tokens:bool=True, )→None#

staticparams_imply_greedy_decoding( *, temperature:float|None, top_p:float|None, top_k:int|None, )[source]#

add_special_tokens:bool#

additional_model_outputs:List[str]|None#

apply_batched_logits_processor:bool#

bad:str|List[str]|None#

bad_token_ids:List[int]|None#

beam_search_diversity_rate:float|None#

beam_width_array:List[int]|None#

best_of:int|None#

detokenize:bool#

early_stopping:int|None#

embedding_bias:Tensor|None#

end_id:int|None#

exclude_input_from_output:bool#

frequency_penalty:float|None#

guided_decoding:GuidedDecodingParams|None#

ignore_eos:bool#

include_stop_str_in_output:bool#

length_penalty:float|None#

logits_processor:LogitsProcessor|List[LogitsProcessor]|None#

logprobs:int|None#

lookahead_config:LookaheadDecodingConfig|None#

max_tokens:int#

min_p:float|None#

min_tokens:int|None#

n:int#

no_repeat_ngram_size:int|None#

pad_id:int|None#

presence_penalty:float|None#

prompt_ignore_length:int|None#

prompt_logprobs:int|None#

repetition_penalty:float|None#

return_context_logits:bool#

return_encoder_output:bool#

return_generation_logits:bool#

return_perf_metrics:bool#

seed:int|None#

skip_special_tokens:bool#

spaces_between_special_tokens:bool#

stop:str|List[str]|None#

stop_token_ids:List[int]|None#

temperature:float|None#

top_k:int|None#

top_p:float|None#

top_p_decay:float|None#

top_p_min:float|None#

top_p_reset_ids:int|None#

truncate_prompt_tokens:int|None#

use_beam_search:bool#

classtensorrt_llm.llmapi.DisaggregatedParams( *, request_type:str|None=None, first_gen_tokens:List[int]|None=None, ctx_request_id:int|None=None, opaque_state:bytes|None=None, draft_tokens:List[int]|None=None, multimodal_embedding_handles:List[Dict[str,Any]]|None=None, multimodal_hashes:List[List[int]]|None=None, )[source]#

Bases:object

Disaggregated serving parameters.

Parameters:

request_type (str) – The type of request (“context_only” | “generation_only” | “context_and_generation”)
first_gen_tokens (List[int]) – The first tokens of the generation request
ctx_request_id (int) – The context request id
opaque_state (bytes) – Any additional state needing to be exchanged between context and gen instances
draft_tokens (List[int]) – The draft tokens of the generation request
multimodal_embedding_handles (List[Dict[str,Any]]) – The resulting multimodal embedding handles from ViT.
multimodal_hashes (List[List[int]]) – The multimodal hashes of each multimodal item in the request.

__init__( *, request_type:str|None=None, first_gen_tokens:List[int]|None=None, ctx_request_id:int|None=None, opaque_state:bytes|None=None, draft_tokens:List[int]|None=None, multimodal_embedding_handles:List[Dict[str,Any]]|None=None, multimodal_hashes:List[List[int]]|None=None, )→None#

get_context_phase_params()→ContextPhaseParams[source]#

get_request_type()→RequestType[source]#

ctx_request_id:int|None#

draft_tokens:List[int]|None#

first_gen_tokens:List[int]|None#

multimodal_embedding_handles:List[Dict[str,Any]]|None#

multimodal_hashes:List[List[int]]|None#

opaque_state:bytes|None#

request_type:str|None#

classtensorrt_llm.llmapi.KvCacheConfig( *, enable_block_reuse:bool=True, max_tokens:int|None=None, max_attention_window:List[int]|None=None, sink_token_length:int|None=None, free_gpu_memory_fraction:float|None=0.9, host_cache_size:int|None=None, onboard_blocks:bool=True, cross_kv_cache_fraction:float|None=None, secondary_offload_min_priority:int|None=None, event_buffer_max_size:int=0, attention_dp_events_gather_period_ms:int=5, enable_partial_reuse:bool=True, copy_on_partial_reuse:bool=True, use_uvm:bool=False, max_gpu_total_bytes:int=0, dtype:str='auto', mamba_ssm_cache_dtype:Literal['auto','float16','bfloat16','float32']='auto', tokens_per_block:int=32, )[source]#

Bases:StrictBaseModel,PybindMirror

Configuration for the KV cache.

fieldattention_dp_events_gather_period_ms:int=5#: The period in milliseconds to gather attention DP events across ranks.

fieldcopy_on_partial_reuse:bool=True#: Whether partially matched blocks that are in use can be reused after copying them.

fieldcross_kv_cache_fraction:float|None=None#: The fraction of the KV Cache memory should be reserved for cross attention. If set to p, self attention will use 1-p of KV Cache memory and cross attention will use p of KV Cache memory. Default is 50%. Should only be set when using encoder-decoder model.

fielddtype:str='auto'#: The data type to use for the KV cache.

fieldenable_block_reuse:bool=True#: Controls if KV cache blocks can be reused for different requests.

fieldenable_partial_reuse:bool=True#: Whether blocks that are only partially matched can be reused.

fieldevent_buffer_max_size:int=0#: Maximum size of the event buffer. If set to 0, the event buffer will not be used.

fieldfree_gpu_memory_fraction:float|None=0.9#: The fraction of GPU memory fraction that should be allocated for the KV cache. Default is 90%. If bothmax_tokens andfree_gpu_memory_fraction are specified, memory corresponding to the minimum will be used.

fieldhost_cache_size:int|None=None#: Size of the host cache in bytes. If bothmax_tokens andhost_cache_size are specified, memory corresponding to the minimum will be used.

fieldmamba_ssm_cache_dtype:Literal['auto','float16','bfloat16','float32']='auto'#: The data type to use for the Mamba SSM cache. If set to ‘auto’, the data type will be inferred from the model config.

fieldmax_attention_window:List[int]|None=None#: Size of the attention window for each sequence. Only the last tokens will be stored in the KV cache. If the number of elements inmax_attention_window is less than the number of layers,max_attention_window will be repeated multiple times to the number of layers.

fieldmax_gpu_total_bytes:int=0#: The maximum size in bytes of GPU memory that can be allocated for the KV cache. If bothmax_gpu_total_bytes andfree_gpu_memory_fraction are specified, memory corresponding to the minimum will be allocated.

fieldmax_tokens:int|None=None#: The maximum number of tokens that should be stored in the KV cache. If bothmax_tokens andfree_gpu_memory_fraction are specified, memory corresponding to the minimum will be used.

fieldonboard_blocks:bool=True#: Controls if blocks are onboarded.

fieldsecondary_offload_min_priority:int|None=None#: Only blocks with priority > mSecondaryOfflineMinPriority can be offloaded to secondary memory.

fieldsink_token_length:int|None=None#: Number of sink tokens (tokens to always keep in attention window).

fieldtokens_per_block:int=32#: The number of tokens per block.

fielduse_uvm:bool=False#: Whether to use UVM for the KV cache.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

classmethodmodel_json_schema( by_alias:bool=True, ref_template:str='#/$defs/{model}', schema_generator:type[~pydantic.json_schema.GenerateJsonSchema]=<class'pydantic.json_schema.GenerateJsonSchema'>, mode:~typing.Literal['validation', 'serialization']='validation', )→dict[str,Any]#

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

validatorvalidate_free_gpu_memory_fraction » free_gpu_memory_fraction[source]#: Validates that the fraction is between 0.0 and 1.0.

validatorvalidate_max_attention_window » max_attention_window[source]#

validatorvalidate_max_gpu_total_bytes » max_gpu_total_bytes[source]#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'attention_dp_events_gather_period_ms':FieldInfo(annotation=int,required=False,default=5,description='TheperiodinmillisecondstogatherattentionDPeventsacrossranks.'),'copy_on_partial_reuse':FieldInfo(annotation=bool,required=False,default=True,description='Whetherpartiallymatchedblocksthatareinusecanbereusedaftercopyingthem.'),'cross_kv_cache_fraction':FieldInfo(annotation=Union[float,NoneType],required=False,default=None,description='ThefractionoftheKVCachememoryshouldbereservedforcrossattention.Ifsettop,selfattentionwilluse1-pofKVCachememoryandcrossattentionwillusepofKVCachememory.Defaultis50%.Shouldonlybesetwhenusingencoder-decodermodel.'),'dtype':FieldInfo(annotation=str,required=False,default='auto',description='ThedatatypetousefortheKVcache.'),'enable_block_reuse':FieldInfo(annotation=bool,required=False,default=True,description='ControlsifKVcacheblockscanbereusedfordifferentrequests.'),'enable_partial_reuse':FieldInfo(annotation=bool,required=False,default=True,description='Whetherblocksthatareonlypartiallymatchedcanbereused.'),'event_buffer_max_size':FieldInfo(annotation=int,required=False,default=0,description='Maximumsizeoftheeventbuffer.Ifsetto0,theeventbufferwillnotbeused.'),'free_gpu_memory_fraction':FieldInfo(annotation=Union[float,NoneType],required=False,default=0.9,description='ThefractionofGPUmemoryfractionthatshouldbeallocatedfortheKVcache.Defaultis90%.Ifboth`max_tokens`and`free_gpu_memory_fraction`arespecified,memorycorrespondingtotheminimumwillbeused.'),'host_cache_size':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Sizeofthehostcacheinbytes.Ifboth`max_tokens`and`host_cache_size`arespecified,memorycorrespondingtotheminimumwillbeused.'),'mamba_ssm_cache_dtype':FieldInfo(annotation=Literal['auto','float16','bfloat16','float32'],required=False,default='auto',description="ThedatatypetousefortheMambaSSMcache.Ifsetto'auto',thedatatypewillbeinferredfromthemodelconfig."),'max_attention_window':FieldInfo(annotation=Union[List[int],NoneType],required=False,default=None,description='Sizeoftheattentionwindowforeachsequence.OnlythelasttokenswillbestoredintheKVcache.Ifthenumberofelementsin`max_attention_window`islessthanthenumberoflayers,`max_attention_window`willberepeatedmultipletimestothenumberoflayers.'),'max_gpu_total_bytes':FieldInfo(annotation=int,required=False,default=0,description='ThemaximumsizeinbytesofGPUmemorythatcanbeallocatedfortheKVcache.Ifboth`max_gpu_total_bytes`and`free_gpu_memory_fraction`arespecified,memorycorrespondingtotheminimumwillbeallocated.'),'max_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='ThemaximumnumberoftokensthatshouldbestoredintheKVcache.Ifboth`max_tokens`and`free_gpu_memory_fraction`arespecified,memorycorrespondingtotheminimumwillbeused.'),'onboard_blocks':FieldInfo(annotation=bool,required=False,default=True,description='Controlsifblocksareonboarded.'),'secondary_offload_min_priority':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Onlyblockswithpriority>mSecondaryOfflineMinPrioritycanbeoffloadedtosecondarymemory.'),'sink_token_length':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Numberofsinktokens(tokenstoalwayskeepinattentionwindow).'),'tokens_per_block':FieldInfo(annotation=int,required=False,default=32,description='Thenumberoftokensperblock.'),'use_uvm':FieldInfo(annotation=bool,required=False,default=False,description='WhethertouseUVMfortheKVcache.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.KvCacheRetentionConfig(*args,**kwargs)#

Bases:object

classTokenRangeRetentionConfig(*args,**kwargs)#

Bases:object

__init__#

propertyduration_ms#: (self) -> datetime.timedelta | None

propertypriority#: (self) -> int

propertytoken_end#: (self) -> int | None

propertytoken_start#: (self) -> int

__init__#

propertydecode_duration_ms#: (self) -> datetime.timedelta | None

propertydecode_retention_priority#: (self) -> int

propertydirectory#: (self) -> str

propertytoken_range_retention_configs#: (self) -> list[tensorrt_llm.bindings.executor.KvCacheRetentionConfig.TokenRangeRetentionConfig]

propertytransfer_mode#: (self) -> tensorrt_llm.bindings.executor.KvCacheTransferMode

classtensorrt_llm.llmapi.CudaGraphConfig( *, batch_sizes:List[int]|None=None, max_batch_size:int=0, enable_padding:bool=False, )[source]#

Bases:StrictBaseModel

Configuration for CUDA graphs.

fieldbatch_sizes:List[int]|None=None#: List of batch sizes to create CUDA graphs for.

fieldenable_padding:bool=False#: If true, batches are rounded up to the nearest cuda_graph_batch_size. This is usually a net win for performance.

fieldmax_batch_size:int=0#: Maximum batch size for CUDA graphs.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

validatorvalidate_cuda_graph_max_batch_size » max_batch_size[source]#: Validate cuda_graph_config.max_batch_size is non-negative.

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'batch_sizes':FieldInfo(annotation=Union[List[int],NoneType],required=False,default=None,description='ListofbatchsizestocreateCUDAgraphsfor.'),'enable_padding':FieldInfo(annotation=bool,required=False,default=False,description='Iftrue,batchesareroundeduptothenearestcuda_graph_batch_size.Thisisusuallyanetwinforperformance.'),'max_batch_size':FieldInfo(annotation=int,required=False,default=0,description='MaximumbatchsizeforCUDAgraphs.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.MoeConfig( *, backend:Literal['CUTLASS','CUTEDSL','WIDEEP','TRTLLM','DEEPGEMM','VANILLA','TRITON']='CUTLASS', max_num_tokens:int|None=None, load_balancer:object|str|None=None, disable_finalize_fusion:bool=False, use_low_precision_moe_combine:bool=False, )[source]#

Bases:StrictBaseModel

Configuration for MoE.

fieldbackend:Literal['CUTLASS','CUTEDSL','WIDEEP','TRTLLM','DEEPGEMM','VANILLA','TRITON']='CUTLASS'#: MoE backend to use.

fielddisable_finalize_fusion:bool=False#: Disable FC2+finalize kernel fusion in CUTLASS MoE backend. Setting this to True recovers deterministic numerical behavior with top-k > 2.

fieldload_balancer:object|str|None=None#: Configuration for MoE load balancing.

fieldmax_num_tokens:int|None=None#: If set, at most max_num_tokens tokens will be sent to torch.ops.trtllm.fused_moe at the same time. If the number of tokens exceeds max_num_tokens, the input tensors will be split into chunks and a for loop will be used.

fielduse_low_precision_moe_combine:bool=False#: Use low precision combine in MoE operations (only for NVFP4 quantization). When enabled, uses lower precision for combining expert outputs to improve performance.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'backend':FieldInfo(annotation=Literal['CUTLASS','CUTEDSL','WIDEEP','TRTLLM','DEEPGEMM','VANILLA','TRITON'],required=False,default='CUTLASS',description='MoEbackendtouse.'),'disable_finalize_fusion':FieldInfo(annotation=bool,required=False,default=False,description='DisableFC2+finalizekernelfusioninCUTLASSMoEbackend.SettingthistoTruerecoversdeterministicnumericalbehaviorwithtop-k>2.'),'load_balancer':FieldInfo(annotation=Union[object,str,NoneType],required=False,default=None,description='ConfigurationforMoEloadbalancing.',json_schema_extra={'type':'Union[MoeLoadBalancerConfig,dict,str]'}),'max_num_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Ifset,atmostmax_num_tokenstokenswillbesenttotorch.ops.trtllm.fused_moeatthesametime.Ifthenumberoftokensexceedsmax_num_tokens,theinputtensorswillbesplitintochunksandaforloopwillbeused.'),'use_low_precision_moe_combine':FieldInfo(annotation=bool,required=False,default=False,description='UselowprecisioncombineinMoEoperations(onlyforNVFP4quantization).Whenenabled,useslowerprecisionforcombiningexpertoutputstoimproveperformance.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.LookaheadDecodingConfig( *, max_draft_len:int|None=None, max_total_draft_tokens:int|None=None, speculative_model_dir:str|Path|None=None, max_concurrency:int|None=None, draft_len_schedule:dict[int,int]|None=None, load_format:str|None=None, acceptance_window:int|None=None, acceptance_length_threshold:float|None=None, max_window_size:int=4, max_ngram_size:int=3, max_verification_set_size:int=4, )[source]#

Bases:DecodingBaseConfig,PybindMirror

Configuration for lookahead speculative decoding.

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_ngram_size:int=3#: Number of tokens per NGram.

fieldmax_total_draft_tokens:int|None=None#

fieldmax_verification_set_size:int=4#: Number of NGrams in verification branch per step.

fieldmax_window_size:int=4#: Number of NGrams in lookahead branch per step.

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**data)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

calculate_speculative_resource()[source]#

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

validatorvalidate_positive_values » max_window_size,max_verification_set_size,max_ngram_size[source]#

decoding_type:ClassVar[str]='Lookahead'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_ngram_size':FieldInfo(annotation=int,required=False,default=3,description='NumberoftokensperNGram.'),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_verification_set_size':FieldInfo(annotation=int,required=False,default=4,description='NumberofNGramsinverificationbranchperstep.'),'max_window_size':FieldInfo(annotation=int,required=False,default=4,description='NumberofNGramsinlookaheadbranchperstep.'),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldmedusa_choices:List[List[int]]|None=None#

fieldnum_medusa_heads:int|None=None#

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='Medusa'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'medusa_choices':FieldInfo(annotation=Union[List[List[int]],NoneType],required=False,default=None),'num_medusa_heads':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fielddynamic_tree_max_topK:int|None=None#

fieldeagle3_layers_to_capture:Set[int]|None=None#

fieldeagle3_one_model:bool|None=True#

fieldeagle_choices:List[List[int]]|None=None#

fieldgreedy_sampling:bool|None=True#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_non_leaves_per_layer:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldnum_eagle_layers:int|None=None#

fieldposterior_threshold:float|None=None#

fieldspeculative_model_dir:str|Path|None=None#

fielduse_dynamic_tree:bool|None=False#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

check_eagle_choices()[source]#

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None[source]#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='Eagle'#

propertyis_linear_tree:bool#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'dynamic_tree_max_topK':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'eagle3_layers_to_capture':FieldInfo(annotation=Union[Set[int],NoneType],required=False,default=None),'eagle3_one_model':FieldInfo(annotation=Union[bool,NoneType],required=False,default=True),'eagle_choices':FieldInfo(annotation=Union[List[List[int]],NoneType],required=False,default=None),'greedy_sampling':FieldInfo(annotation=Union[bool,NoneType],required=False,default=True),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_non_leaves_per_layer':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'num_eagle_layers':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'posterior_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None),'use_dynamic_tree':FieldInfo(annotation=Union[bool,NoneType],required=False,default=False)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertynum_capture_layers:int#: Returns the number of layers to capture of the target model.If eagle3_layers_to_capture is not None, return the length of the set.Otherwise, assume Eagle3 base set and return 3.

propertyspec_dec_mode#

classtensorrt_llm.llmapi.MTPDecodingConfig( *, max_draft_len:int|None=None, max_total_draft_tokens:int|None=None, speculative_model_dir:str|Path|None=None, max_concurrency:int|None=None, draft_len_schedule:dict[int,int]|None=None, load_format:str|None=None, acceptance_window:int|None=None, acceptance_length_threshold:float|None=None, num_nextn_predict_layers:int=1, use_relaxed_acceptance_for_thinking:bool=False, relaxed_topk:int=1, relaxed_delta:float=0.0, use_mtp_vanilla:bool=False, mtp_eagle_one_model:bool=True, num_nextn_predict_layers_from_model_config:int=1, begin_thinking_phase_token:int=128798, end_thinking_phase_token:int=128799, )[source]#

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fieldbegin_thinking_phase_token:int=128798#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldend_thinking_phase_token:int=128799#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldmtp_eagle_one_model:bool=True#

fieldnum_nextn_predict_layers:int=1#

fieldnum_nextn_predict_layers_from_model_config:int=1#

fieldrelaxed_delta:float=0.0#

fieldrelaxed_topk:int=1#

fieldspeculative_model_dir:str|Path|None=None#

fielduse_mtp_vanilla:bool=False#

fielduse_relaxed_acceptance_for_thinking:bool=False#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(**localns:Any)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='MTP'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'begin_thinking_phase_token':FieldInfo(annotation=int,required=False,default=128798),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'end_thinking_phase_token':FieldInfo(annotation=int,required=False,default=128799),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'mtp_eagle_one_model':FieldInfo(annotation=bool,required=False,default=True),'num_nextn_predict_layers':FieldInfo(annotation=int,required=False,default=1),'num_nextn_predict_layers_from_model_config':FieldInfo(annotation=int,required=False,default=1),'relaxed_delta':FieldInfo(annotation=float,required=False,default=0.0),'relaxed_topk':FieldInfo(annotation=int,required=False,default=1),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None),'use_mtp_vanilla':FieldInfo(annotation=bool,required=False,default=False),'use_relaxed_acceptance_for_thinking':FieldInfo(annotation=bool,required=False,default=False)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertynum_capture_layers:int#

propertyspec_dec_mode#

classtensorrt_llm.llmapi.SchedulerConfig( *, capacity_scheduler_policy:CapacitySchedulerPolicy=CapacitySchedulerPolicy.GUARANTEED_NO_EVICT, context_chunking_policy:ContextChunkingPolicy|None=None, dynamic_batch_config:DynamicBatchConfig|None=None, )[source]#

Bases:StrictBaseModel,PybindMirror

fieldcapacity_scheduler_policy:CapacitySchedulerPolicy=CapacitySchedulerPolicy.GUARANTEED_NO_EVICT#: The capacity scheduler policy to use

fieldcontext_chunking_policy:ContextChunkingPolicy|None=None#: The context chunking policy to use

fielddynamic_batch_config:DynamicBatchConfig|None=None#: The dynamic batch config to use

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'capacity_scheduler_policy':FieldInfo(annotation=CapacitySchedulerPolicy,required=False,default=<CapacitySchedulerPolicy.GUARANTEED_NO_EVICT:'GUARANTEED_NO_EVICT'>,description='Thecapacityschedulerpolicytouse'),'context_chunking_policy':FieldInfo(annotation=Union[ContextChunkingPolicy,NoneType],required=False,default=None,description='Thecontextchunkingpolicytouse'),'dynamic_batch_config':FieldInfo(annotation=Union[DynamicBatchConfig,NoneType],required=False,default=None,description='Thedynamicbatchconfigtouse')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.CapacitySchedulerPolicy( value, names=<notgiven>, *values, module=None, qualname=None, type=None, start=1, boundary=None, )[source]#

Bases:StrEnum

__init__(*args,**kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lowercase.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width,fillchar='',/)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[,start[,end]])→int#: Return the number of non-overlapping occurrences of substring sub instring S[start:end]. Optional arguments start and end areinterpreted as in slice notation.

encode(encoding='utf-8',errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors.The default is ‘strict’ meaning that encoding errors raise aUnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and‘xmlcharrefreplace’ as well as any other name registered withcodecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[,start[,end]])→bool#: Return True if S ends with the specified suffix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args,**kwargs)→str#: Return a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping)→str#: Return a formatted version of S, using substitutions from mapping.The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric andthere is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and thereis at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal andthere is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and thereis at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase andthere is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is atleast one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable inrepr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and thereis at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may onlyfollow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase andthere is at least one cased character in the string.

join(iterable,/)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string.The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width,fillchar='',/)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None,/)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

staticmaketrans()#

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicodeordinals (integers) or characters to Unicode ordinals, strings or None.Character keys will be then converted to ordinals.If there are two arguments, they must be strings of equal length, andin the resulting dictionary, each character in x will be mapped to thecharacter at the same position in y. If there is a third argument, itmust be a string, whose characters will be mapped to None in the result.

partition(sep,/)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found,returns a 3-tuple containing the part before the separator, the separatoritself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original stringand two empty strings.

removeprefix(prefix,/)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):].Otherwise, return a copy of the original string.

removesuffix(suffix,/)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty,return string[:-len(suffix)]. Otherwise, return a copy of the originalstring.

replace(old,new,count=-1,/)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace.-1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences arereplaced.

rfind(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width,fillchar='',/)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep,/)#

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. Ifthe separator is found, returns a 3-tuple containing the part before theseparator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty stringsand the original string.

rsplit(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None,/)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionallydelimited. With natural text that includes punctuation, consider usingthe regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given andtrue.

startswith(prefix[,start[,end]])→bool#: Return True if S starts with the specified prefix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.prefix can also be a tuple of strings to try.

strip(chars=None,/)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remainingcased characters have lower case.

translate(table,/)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals toUnicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance adictionary or list. If this operation raises LookupError, the character isleft untouched. Characters mapped to None are deleted.

upper()#: Return a copy of the string converted to uppercase.

zfill(width,/)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

GUARANTEED_NO_EVICT='GUARANTEED_NO_EVICT'#

MAX_UTILIZATION='MAX_UTILIZATION'#

STATIC_BATCH='STATIC_BATCH'#

classtensorrt_llm.llmapi.BuildConfig( *, max_input_len:int=1024, max_seq_len:int|None=None, opt_batch_size:int=8, max_batch_size:int=2048, max_beam_width:int=1, max_num_tokens:int=8192, opt_num_tokens:int|None=None, max_prompt_embedding_table_size:int=0, kv_cache_type:~tensorrt_llm.llmapi.kv_cache_type.KVCacheType|None=None, gather_context_logits:bool=False, gather_generation_logits:bool=False, strongly_typed:bool=True, force_num_profiles:int|None=None, profiling_verbosity:str='layer_names_only', enable_debug_output:bool=False, max_draft_len:int=0, speculative_decoding_mode:~tensorrt_llm.models.modeling_utils.SpeculativeDecodingMode=<SpeculativeDecodingMode.NONE:1>, use_refit:bool=False, input_timing_cache:str|None=None, output_timing_cache:str='model.cache', lora_config:~tensorrt_llm.lora_helper.LoraConfig=<factory>, weight_sparsity:bool=False, weight_streaming:bool=False, plugin_config:~tensorrt_llm.plugin.plugin.PluginConfig=<factory>, use_strip_plan:bool=False, max_encoder_input_len:int=1024, dry_run:bool=False, visualize_network:str|None=None, monitor_memory:bool=False, use_mrope:bool=False, )[source]#

Bases:BaseModel

Configuration class for TensorRT LLM engine building parameters.

This class contains all the configuration parameters needed to build a TensorRT LLM engine,including sequence length limits, batch sizes, optimization settings, and various features.

fielddry_run:bool=False#: Whether to perform a dry run without actually building the engine.

fieldenable_debug_output:bool=False#: Whether to enable debug output during building.

fieldforce_num_profiles:int|None=None#: Force a specific number of optimization profiles. If None, auto-determined.

fieldgather_context_logits:bool=False#: Whether to gather logits during context phase.

fieldgather_generation_logits:bool=False#: Whether to gather logits during generation phase.

fieldinput_timing_cache:str|None=None#: Path to input timing cache file. If None, no input cache used.

fieldkv_cache_type:KVCacheType|None=None#: Type of KV cache to use (CONTINUOUS or PAGED). If None, defaults to PAGED.

fieldlora_config:LoraConfig[Optional]#: Configuration for LoRA (Low-Rank Adaptation) fine-tuning.

fieldmax_batch_size:int=2048#: Maximum batch size the engine can handle.

fieldmax_beam_width:int=1#: Maximum beam width for beam search decoding.

fieldmax_draft_len:int=0#: Maximum length of draft tokens for speculative decoding.

fieldmax_encoder_input_len:int=1024#: Maximum encoder input length for encoder-decoder models.

fieldmax_input_len:int=1024#: Maximum length of input sequences.

fieldmax_num_tokens:int=8192#: Maximum number of batched input tokens after padding is removed in each batch.

fieldmax_prompt_embedding_table_size:int=0#: Maximum size of prompt embedding table for prompt tuning.

fieldmax_seq_len:int|None=None#: The maximum possible sequence length for a single request, including both input and generated output tokens.

fieldmonitor_memory:bool=False#: Whether to monitor memory usage during building.

fieldopt_batch_size:int=8#: Optimal batch size for engine optimization.

fieldopt_num_tokens:int|None=None#: Optimal number of batched input tokens for engine optimization.

fieldoutput_timing_cache:str='model.cache'#: Path to output timing cache file.

fieldplugin_config:PluginConfig[Optional]#: Configuration for TensorRT LLM plugins.

fieldprofiling_verbosity:str='layer_names_only'#: Verbosity level for TensorRT profiling (‘layer_names_only’, ‘detailed’, ‘none’).

fieldspeculative_decoding_mode:SpeculativeDecodingMode=<SpeculativeDecodingMode.NONE:1>#: Mode for speculative decoding (NONE, MEDUSA, EAGLE, etc.).

fieldstrongly_typed:bool=True#: Whether to use strongly_typed.

fielduse_mrope:bool=False#: Whether to use Multi-RoPE (Rotary Position Embedding) optimization.

fielduse_refit:bool=False#: Whether to enable engine refitting capabilities.

fielduse_strip_plan:bool=False#: Whether to use stripped plan for engine building.

fieldvisualize_network:str|None=None#: Path to save network visualization. If None, no visualization generated.

fieldweight_sparsity:bool=False#: Whether to enable weight sparsity optimization.

fieldweight_streaming:bool=False#: Whether to enable weight streaming for large models.

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_json_file(config_file)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

update_kv_cache_type(model_architecture:str)[source]#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'dry_run':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoperformadryrunwithoutactuallybuildingtheengine.'),'enable_debug_output':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoenabledebugoutputduringbuilding.'),'force_num_profiles':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Forceaspecificnumberofoptimizationprofiles.IfNone,auto-determined.'),'gather_context_logits':FieldInfo(annotation=bool,required=False,default=False,description='Whethertogatherlogitsduringcontextphase.'),'gather_generation_logits':FieldInfo(annotation=bool,required=False,default=False,description='Whethertogatherlogitsduringgenerationphase.'),'input_timing_cache':FieldInfo(annotation=Union[str,NoneType],required=False,default=None,description='Pathtoinputtimingcachefile.IfNone,noinputcacheused.'),'kv_cache_type':FieldInfo(annotation=Union[KVCacheType,NoneType],required=False,default=None,description='TypeofKVcachetouse(CONTINUOUSorPAGED).IfNone,defaultstoPAGED.'),'lora_config':FieldInfo(annotation=LoraConfig,required=False,default_factory=LoraConfig,description='ConfigurationforLoRA(Low-RankAdaptation)fine-tuning.'),'max_batch_size':FieldInfo(annotation=int,required=False,default=2048,description='Maximumbatchsizetheenginecanhandle.'),'max_beam_width':FieldInfo(annotation=int,required=False,default=1,description='Maximumbeamwidthforbeamsearchdecoding.'),'max_draft_len':FieldInfo(annotation=int,required=False,default=0,description='Maximumlengthofdrafttokensforspeculativedecoding.'),'max_encoder_input_len':FieldInfo(annotation=int,required=False,default=1024,description='Maximumencoderinputlengthforencoder-decodermodels.'),'max_input_len':FieldInfo(annotation=int,required=False,default=1024,description='Maximumlengthofinputsequences.'),'max_num_tokens':FieldInfo(annotation=int,required=False,default=8192,description='Maximumnumberofbatchedinputtokensafterpaddingisremovedineachbatch.'),'max_prompt_embedding_table_size':FieldInfo(annotation=int,required=False,default=0,description='Maximumsizeofpromptembeddingtableforprompttuning.'),'max_seq_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Themaximumpossiblesequencelengthforasinglerequest,includingbothinputandgeneratedoutputtokens.'),'monitor_memory':FieldInfo(annotation=bool,required=False,default=False,description='Whethertomonitormemoryusageduringbuilding.'),'opt_batch_size':FieldInfo(annotation=int,required=False,default=8,description='Optimalbatchsizeforengineoptimization.'),'opt_num_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Optimalnumberofbatchedinputtokensforengineoptimization.'),'output_timing_cache':FieldInfo(annotation=str,required=False,default='model.cache',description='Pathtooutputtimingcachefile.'),'plugin_config':FieldInfo(annotation=PluginConfig,required=False,default_factory=PluginConfig,description='ConfigurationforTensorRTLLMplugins.'),'profiling_verbosity':FieldInfo(annotation=str,required=False,default='layer_names_only',description="VerbositylevelforTensorRTprofiling('layer_names_only','detailed','none')."),'speculative_decoding_mode':FieldInfo(annotation=SpeculativeDecodingMode,required=False,default=<SpeculativeDecodingMode.NONE:1>,description='Modeforspeculativedecoding(NONE,MEDUSA,EAGLE,etc.).'),'strongly_typed':FieldInfo(annotation=bool,required=False,default=True,description='Whethertousestrongly_typed.'),'use_mrope':FieldInfo(annotation=bool,required=False,default=False,description='WhethertouseMulti-RoPE(RotaryPositionEmbedding)optimization.'),'use_refit':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoenableenginerefittingcapabilities.'),'use_strip_plan':FieldInfo(annotation=bool,required=False,default=False,description='Whethertousestrippedplanforenginebuilding.'),'visualize_network':FieldInfo(annotation=Union[str,NoneType],required=False,default=None,description='Pathtosavenetworkvisualization.IfNone,novisualizationgenerated.'),'weight_sparsity':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoenableweightsparsityoptimization.'),'weight_streaming':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoenableweightstreamingforlargemodels.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.QuantConfig( quant_algo:QuantAlgo|None=None, kv_cache_quant_algo:QuantAlgo|None=None, group_size:int=128, smoothquant_val:float=0.5, clamp_val:List[float]|None=None, use_meta_recipe:bool=False, has_zero_point:bool=False, pre_quant_scale:bool=False, exclude_modules:List[str]|None=None, mamba_ssm_cache_dtype:str|None=None, )[source]#

Bases:object

Serializable quantization configuration class, part of the PretrainedConfig.

Parameters:

quant_algo (tensorrt_llm.quantization.mode.QuantAlgo,optional) – Quantization algorithm. Defaults to None.
kv_cache_quant_algo (tensorrt_llm.quantization.mode.QuantAlgo,optional) – KV cache quantization algorithm. Defaults to None.
group_size (int) – The group size for group-wise quantization. Defaults to 128.
smoothquant_val (float) – The smoothing parameter alpha used in smooth quant. Defaults to 0.5.
clamp_val (List[float],optional) – The clamp values used in FP8 rowwise quantization. Defaults to None.
use_meta_recipe (bool) – Whether to use Meta’s recipe for FP8 rowwise quantization. Defaults to False.
has_zero_point (bool) – Whether to use zero point for quantization. Defaults to False.
pre_quant_scale (bool) – Whether to use pre-quant scale for quantization. Defaults to False.
exclude_modules (List[str],optional) – The module name patterns that are skipped in quantization. Defaults to None.
mamba_ssm_cache_dtype (str,optional) – The data type for mamba SSM cache. Defaults to None.

__init__( quant_algo:QuantAlgo|None=None, kv_cache_quant_algo:QuantAlgo|None=None, group_size:int=128, smoothquant_val:float=0.5, clamp_val:List[float]|None=None, use_meta_recipe:bool=False, has_zero_point:bool=False, pre_quant_scale:bool=False, exclude_modules:List[str]|None=None, mamba_ssm_cache_dtype:str|None=None, )→None#

classmethodfrom_dict( config:dict, )→QuantConfig[source]#

Create a QuantConfig instance from a dict.

Parameters:: config (dict) – The dict used to create QuantConfig.
Returns:: The QuantConfig created from dict.
Return type:: tensorrt_llm.models.modeling_utils.QuantConfig

is_module_excluded_from_quantization(name:str)→bool[source]#

Check if the module is excluded from quantization.

Parameters:: name (str) – The name of the module.
Returns:: True if the module is excluded from quantization, False otherwise.
Return type:: bool

to_dict()→dict[source]#

Dump a QuantConfig instance to a dict.

Returns:: The dict dumped from QuantConfig.
Return type:: dict

clamp_val:List[float]|None=None#

exclude_modules:List[str]|None=None#

group_size:int=128#

has_zero_point:bool=False#

kv_cache_quant_algo:QuantAlgo|None=None#

propertylayer_quant_mode:QuantMode#

mamba_ssm_cache_dtype:str|None=None#

pre_quant_scale:bool=False#

quant_algo:QuantAlgo|None=None#

propertyquant_mode:QuantModeWrapper#

smoothquant_val:float=0.5#

use_meta_recipe:bool=False#

classtensorrt_llm.llmapi.QuantAlgo( value, names=<notgiven>, *values, module=None, qualname=None, type=None, start=1, boundary=None, )[source]#

Bases:StrEnum

__init__(*args,**kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lowercase.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width,fillchar='',/)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[,start[,end]])→int#: Return the number of non-overlapping occurrences of substring sub instring S[start:end]. Optional arguments start and end areinterpreted as in slice notation.

encode(encoding='utf-8',errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors.The default is ‘strict’ meaning that encoding errors raise aUnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and‘xmlcharrefreplace’ as well as any other name registered withcodecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[,start[,end]])→bool#: Return True if S ends with the specified suffix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args,**kwargs)→str#: Return a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping)→str#: Return a formatted version of S, using substitutions from mapping.The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric andthere is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and thereis at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal andthere is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and thereis at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase andthere is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is atleast one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable inrepr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and thereis at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may onlyfollow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase andthere is at least one cased character in the string.

join(iterable,/)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string.The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width,fillchar='',/)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None,/)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

staticmaketrans()#

Return a translation table usable for str.translate().

partition(sep,/)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found,returns a 3-tuple containing the part before the separator, the separatoritself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original stringand two empty strings.

removeprefix(prefix,/)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):].Otherwise, return a copy of the original string.

removesuffix(suffix,/)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty,return string[:-len(suffix)]. Otherwise, return a copy of the originalstring.

replace(old,new,count=-1,/)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace.-1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences arereplaced.

rfind(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width,fillchar='',/)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep,/)#

Partition the string into three parts using the given separator.

If the separator is not found, returns a 3-tuple containing two empty stringsand the original string.

rsplit(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None,/)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionallydelimited. With natural text that includes punctuation, consider usingthe regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given andtrue.

startswith(prefix[,start[,end]])→bool#: Return True if S starts with the specified prefix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.prefix can also be a tuple of strings to try.

strip(chars=None,/)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remainingcased characters have lower case.

translate(table,/)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals toUnicode ordinals, strings, or None.

upper()#: Return a copy of the string converted to uppercase.

zfill(width,/)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

FP8='FP8'#

FP8_BLOCK_SCALES='FP8_BLOCK_SCALES'#

FP8_PER_CHANNEL_PER_TOKEN='FP8_PER_CHANNEL_PER_TOKEN'#

INT8='INT8'#

MIXED_PRECISION='MIXED_PRECISION'#

NO_QUANT='NO_QUANT'#

NVFP4='NVFP4'#

W4A16='W4A16'#

W4A16_AWQ='W4A16_AWQ'#

W4A16_GPTQ='W4A16_GPTQ'#

W4A16_MXFP4='W4A16_MXFP4'#

W4A8_AWQ='W4A8_AWQ'#

W4A8_MXFP4_FP8='W4A8_MXFP4_FP8'#

W4A8_MXFP4_MXFP8='W4A8_MXFP4_MXFP8'#

W4A8_NVFP4_FP8='W4A8_NVFP4_FP8'#

W4A8_QSERVE_PER_CHANNEL='W4A8_QSERVE_PER_CHANNEL'#

W4A8_QSERVE_PER_GROUP='W4A8_QSERVE_PER_GROUP'#

W8A16='W8A16'#

W8A16_GPTQ='W8A16_GPTQ'#

W8A8_SQ_PER_CHANNEL='W8A8_SQ_PER_CHANNEL'#

W8A8_SQ_PER_CHANNEL_PER_TENSOR_PLUGIN='W8A8_SQ_PER_CHANNEL_PER_TENSOR_PLUGIN'#

W8A8_SQ_PER_CHANNEL_PER_TOKEN_PLUGIN='W8A8_SQ_PER_CHANNEL_PER_TOKEN_PLUGIN'#

W8A8_SQ_PER_TENSOR_PER_TOKEN_PLUGIN='W8A8_SQ_PER_TENSOR_PER_TOKEN_PLUGIN'#

W8A8_SQ_PER_TENSOR_PLUGIN='W8A8_SQ_PER_TENSOR_PLUGIN'#

classtensorrt_llm.llmapi.CalibConfig( *, device:Literal['cuda','cpu']='cuda', calib_dataset:str='cnn_dailymail', calib_batches:int=512, calib_batch_size:int=1, calib_max_seq_length:int=512, random_seed:int=1234, tokenizer_max_seq_length:int=2048, )[source]#

Bases:StrictBaseModel

Calibration configuration.

fieldcalib_batch_size:int=1#: The batch size that the calibration runs.

fieldcalib_batches:int=512#: The number of batches that the calibration runs.

fieldcalib_dataset:str='cnn_dailymail'#: The name or local path of calibration dataset.

fieldcalib_max_seq_length:int=512#: The maximum sequence length that the calibration runs.

fielddevice:Literal['cuda','cpu']='cuda'#: The device to run calibration.

fieldrandom_seed:int=1234#: The random seed used for calibration.

fieldtokenizer_max_seq_length:int=2048#: The maximum sequence length to initialize tokenizer for calibration.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict( config:dict, )→CalibConfig[source]#

Create a CalibConfig instance from a dict.

Parameters:: config (dict) – The dict used to create CalibConfig.
Returns:: The CalibConfig created from dict.
Return type:: tensorrt_llm.llmapi.CalibConfig

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

to_dict()→dict[source]#

Dump a CalibConfig instance to a dict.

Returns:: The dict dumped from CalibConfig.
Return type:: dict

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'calib_batch_size':FieldInfo(annotation=int,required=False,default=1,description='Thebatchsizethatthecalibrationruns.'),'calib_batches':FieldInfo(annotation=int,required=False,default=512,description='Thenumberofbatchesthatthecalibrationruns.'),'calib_dataset':FieldInfo(annotation=str,required=False,default='cnn_dailymail',description='Thenameorlocalpathofcalibrationdataset.'),'calib_max_seq_length':FieldInfo(annotation=int,required=False,default=512,description='Themaximumsequencelengththatthecalibrationruns.'),'device':FieldInfo(annotation=Literal['cuda','cpu'],required=False,default='cuda',description='Thedevicetoruncalibration.'),'random_seed':FieldInfo(annotation=int,required=False,default=1234,description='Therandomseedusedforcalibration.'),'tokenizer_max_seq_length':FieldInfo(annotation=int,required=False,default=2048,description='Themaximumsequencelengthtoinitializetokenizerforcalibration.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.BuildCacheConfig( cache_root:Path|None=None, max_records:int=10, max_cache_storage_gb:float=256, )[source]#

Bases:object

Configuration for the build cache.

cache_root#

The root directory for the build cache.

Type:: str

max_records#

The maximum number of records to store in the cache.

Type:: int

max_cache_storage_gb#

The maximum amount of storage (in GB) to use for the cache.

Type:: float

Note

The build-cache assumes the weights of the model are not changed during the execution. If the weights arechanged, you should remove the caches manually.

__init__( cache_root:Path|None=None, max_records:int=10, max_cache_storage_gb:float=256, )[source]#

propertycache_root:Path#

propertymax_cache_storage_gb:float#

propertymax_records:int#

classtensorrt_llm.llmapi.RequestError[source]#

Bases:RuntimeError

The error raised when the request is failed.

__init__(*args,**kwargs)#

add_note()#: Exception.add_note(note) –add a note to the exception

with_traceback()#: Exception.with_traceback(tb) –set self.__traceback__ to tb and return self.

args#

classtensorrt_llm.llmapi.MpiCommSession(comm=None,n_workers:int=1)[source]#

Bases:MpiSession

__init__(comm=None,n_workers:int=1)[source]#

abort()[source]#

get_comm()[source]#

is_comm_session()→bool#

shutdown(wait=True)[source]#

shutdown_abort(grace:float=60,reason=None)#

submit(
task:Callable[[...],T],
*args,
**kwargs,
)→List[Future[T]][source]#

Submit a task to MPI workers.

Parameters:

task – The task to be submitted.
args – Positional arguments for the task.
kwargs – Keyword arguments for the task.

submit_sync(
task:Callable[[...],T],
*args,
**kwargs,
)→List[T][source]#

classtensorrt_llm.llmapi.ExtendedRuntimePerfKnobConfig( *, multi_block_mode:bool=True, enable_context_fmha_fp32_acc:bool=False, cuda_graph_mode:bool=False, cuda_graph_cache_size:int=0, )[source]#

Bases:StrictBaseModel,PybindMirror

Configuration for extended runtime performance knobs.

fieldcuda_graph_cache_size:int=0#: Number of cuda graphs to be cached in the runtime. The larger the cache, the better the perf, but more GPU memory is consumed.

fieldcuda_graph_mode:bool=False#: Whether to use CUDA graph mode.

fieldenable_context_fmha_fp32_acc:bool=False#: Whether to enable context FMHA FP32 accumulation.

fieldmulti_block_mode:bool=True#: Whether to use multi-block mode.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm( obj:Any, )→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context:Any, /, )→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj( obj:Any, )→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(
**localns:Any,
)→None#

classmethodvalidate( value:Any, )→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'cuda_graph_cache_size':FieldInfo(annotation=int,required=False,default=0,description='Numberofcudagraphstobecachedintheruntime.Thelargerthecache,thebettertheperf,butmoreGPUmemoryisconsumed.'),'cuda_graph_mode':FieldInfo(annotation=bool,required=False,default=False,description='WhethertouseCUDAgraphmode.'),'enable_context_fmha_fp32_acc':FieldInfo(annotation=bool,required=False,default=False,description='WhethertoenablecontextFMHAFP32accumulation.'),'multi_block_mode':FieldInfo(annotation=bool,required=False,default=True,description='Whethertousemulti-blockmode.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.BatchingType( value, names=<notgiven>, *values, module=None, qualname=None, type=None, start=1, boundary=None, )[source]#

Bases:StrEnum

__init__(*args,**kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lowercase.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width,fillchar='',/)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[,start[,end]])→int#: Return the number of non-overlapping occurrences of substring sub instring S[start:end]. Optional arguments start and end areinterpreted as in slice notation.

encode(encoding='utf-8',errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors.The default is ‘strict’ meaning that encoding errors raise aUnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and‘xmlcharrefreplace’ as well as any other name registered withcodecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[,start[,end]])→bool#: Return True if S ends with the specified suffix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args,**kwargs)→str#: Return a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping)→str#: Return a formatted version of S, using substitutions from mapping.The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric andthere is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and thereis at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal andthere is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and thereis at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase andthere is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is atleast one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable inrepr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and thereis at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may onlyfollow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase andthere is at least one cased character in the string.

join(iterable,/)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string.The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width,fillchar='',/)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None,/)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

staticmaketrans()#

Return a translation table usable for str.translate().

partition(sep,/)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found,returns a 3-tuple containing the part before the separator, the separatoritself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original stringand two empty strings.

removeprefix(prefix,/)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):].Otherwise, return a copy of the original string.

removesuffix(suffix,/)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty,return string[:-len(suffix)]. Otherwise, return a copy of the originalstring.

replace(old,new,count=-1,/)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace.-1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences arereplaced.

rfind(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width,fillchar='',/)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep,/)#

Partition the string into three parts using the given separator.

If the separator is not found, returns a 3-tuple containing two empty stringsand the original string.

rsplit(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None,/)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionallydelimited. With natural text that includes punctuation, consider usingthe regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given andtrue.

startswith(prefix[,start[,end]])→bool#: Return True if S starts with the specified prefix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.prefix can also be a tuple of strings to try.

strip(chars=None,/)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remainingcased characters have lower case.

translate(table,/)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals toUnicode ordinals, strings, or None.

upper()#: Return a copy of the string converted to uppercase.

zfill(width,/)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

INFLIGHT='INFLIGHT'#

STATIC='STATIC'#

classtensorrt_llm.llmapi.ContextChunkingPolicy( value, names=<notgiven>, *values, module=None, qualname=None, type=None, start=1, boundary=None, )[source]#

Bases:StrEnum

Context chunking policy.

__init__(*args,**kwds)#

capitalize()#

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lowercase.

casefold()#: Return a version of the string suitable for caseless comparisons.

center(width,fillchar='',/)#

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[,start[,end]])→int#: Return the number of non-overlapping occurrences of substring sub instring S[start:end]. Optional arguments start and end areinterpreted as in slice notation.

encode(encoding='utf-8',errors='strict')#

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors.The default is ‘strict’ meaning that encoding errors raise aUnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and‘xmlcharrefreplace’ as well as any other name registered withcodecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[,start[,end]])→bool#: Return True if S ends with the specified suffix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)#

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args,**kwargs)→str#: Return a formatted version of S, using substitutions from args and kwargs.The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping)→str#: Return a formatted version of S, using substitutions from mapping.The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[,start[,end]])→int#

Return the lowest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()#

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric andthere is at least one character in the string.

isalpha()#

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and thereis at least one character in the string.

isascii()#

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F.Empty string is ASCII too.

isdecimal()#

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal andthere is at least one character in the string.

isdigit()#

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and thereis at least one character in the string.

isidentifier()#

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier,such as “def” or “class”.

islower()#

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase andthere is at least one cased character in the string.

isnumeric()#

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is atleast one character in the string.

isprintable()#

Return True if the string is printable, False otherwise.

A string is printable if all of its characters are considered printable inrepr() or if it is empty.

isspace()#

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and thereis at least one character in the string.

istitle()#

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may onlyfollow uncased characters and lowercase characters only cased ones.

isupper()#

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase andthere is at least one cased character in the string.

join(iterable,/)#

Concatenate any number of strings.

The string whose method is called is inserted in between each given string.The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width,fillchar='',/)#

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()#: Return a copy of the string converted to lowercase.

lstrip(chars=None,/)#

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

staticmaketrans()#

Return a translation table usable for str.translate().

partition(sep,/)#

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found,returns a 3-tuple containing the part before the separator, the separatoritself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original stringand two empty strings.

removeprefix(prefix,/)#

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):].Otherwise, return a copy of the original string.

removesuffix(suffix,/)#

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty,return string[:-len(suffix)]. Otherwise, return a copy of the originalstring.

replace(old,new,count=-1,/)#

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace.-1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences arereplaced.

rfind(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[,start[,end]])→int#

Return the highest index in S where substring sub is found,such that sub is contained within S[start:end]. Optionalarguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width,fillchar='',/)#

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep,/)#

Partition the string into three parts using the given separator.

If the separator is not found, returns a 3-tuple containing two empty stringsand the original string.

rsplit(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None,/)#

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None,maxsplit=-1)#

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.
When set to None (the default value), will split on any whitespacecharacter (including n r t f and spaces) and will discardempty strings from the result.
maxsplit
Maximum number of splits.-1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionallydelimited. With natural text that includes punctuation, consider usingthe regular expression module.

splitlines(keepends=False)#

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given andtrue.

startswith(prefix[,start[,end]])→bool#: Return True if S starts with the specified prefix, False otherwise.With optional start, test S beginning at that position.With optional end, stop comparing S at that position.prefix can also be a tuple of strings to try.

strip(chars=None,/)#

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()#: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()#

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remainingcased characters have lower case.

translate(table,/)#

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals toUnicode ordinals, strings, or None.

upper()#: Return a copy of the string converted to uppercase.

zfill(width,/)#

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

EQUAL_PROGRESS='EQUAL_PROGRESS'#

FIRST_COME_FIRST_SERVED='FIRST_COME_FIRST_SERVED'#

classtensorrt_llm.llmapi.DynamicBatchConfig( *, enable_batch_size_tuning:bool, enable_max_num_tokens_tuning:bool, dynamic_batch_moving_average_window:int, )[source]#

Bases:StrictBaseModel,PybindMirror

Dynamic batch configuration.

Controls how batch size and token limits are dynamically adjusted at runtime.

fielddynamic_batch_moving_average_window:int[Required]#: The window size for moving average of input and output length which is used to calculate dynamic batch size and max num tokens

fieldenable_batch_size_tuning:bool[Required]#: Controls if the batch size should be tuned dynamically

fieldenable_max_num_tokens_tuning:bool[Required]#: Controls if the max num tokens should be tuned dynamically

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'dynamic_batch_moving_average_window':FieldInfo(annotation=int,required=True,description='Thewindowsizeformovingaverageofinputandoutputlengthwhichisusedtocalculatedynamicbatchsizeandmaxnumtokens'),'enable_batch_size_tuning':FieldInfo(annotation=bool,required=True,description='Controlsifthebatchsizeshouldbetuneddynamically'),'enable_max_num_tokens_tuning':FieldInfo(annotation=bool,required=True,description='Controlsifthemaxnumtokensshouldbetuneddynamically')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.CacheTransceiverConfig( *, backend:Literal['DEFAULT','UCX','NIXL','MPI']|None=None, max_tokens_in_buffer:int|None=None, kv_transfer_timeout_ms:Annotated[int|None,Gt(gt=0)]=None, kv_transfer_sender_future_timeout_ms:Annotated[int|None,Gt(gt=0)]=1000, )[source]#

Bases:StrictBaseModel,PybindMirror

Configuration for the cache transceiver.

fieldbackend:Literal['DEFAULT','UCX','NIXL','MPI']|None=None#: The communication backend type to use for the cache transceiver.

fieldkv_transfer_sender_future_timeout_ms:int|None=1000#

Timeout in milliseconds to wait for the sender future to be ready when scheduled batch size is 0. This allows the request to be eventually cancelled by the user or because of kv_transfer_timeout_ms

Constraints:

gt = 0

fieldkv_transfer_timeout_ms:int|None=None#

Timeout in milliseconds for KV cache transfer. Requests exceeding this timeout will be cancelled.

Constraints:

gt = 0

fieldmax_tokens_in_buffer:int|None=None#: The max number of tokens the transfer buffer can fit.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

classmethodfrom_pybind( pybind_instance:PybindMirror, )→T#

Construct an instance of the given class from the fields in the givenpybind class instance.

Parameters:

cls – Type of the class to construct, must be a subclass of pydanticBaseModel
pybind_instance – Instance of the pybind class to construct from itsfields

Notes

When a field value is None in the pybind class, but it’s notoptional and has a default value in the BaseModel class, it wouldget the default value defined in the BaseModel class.

Returns:: Instance of the given class, populated with the fields of the givenpybind instance

staticget_pybind_enum_fields(pybind_class)#: Get all the enum fields from the pybind class.

staticget_pybind_variable_fields(config_cls)#: Get all the variable fields from the pybind class.

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

staticmaybe_to_pybind(ins)#

staticmirror_pybind_enum(pybind_class)#: Mirror the enum fields from the pybind class to the Python class.

staticmirror_pybind_fields(pybind_class)#

Class decorator that ensures Python class fields mirror those of a C++ class.

Parameters:: pybind_class – The C++ class whose fields should be mirrored
Returns:: A decorator function that validates field mirroring

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context:Any, /, )→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

staticpybind_equals(obj0,obj1)#: Check if two pybind objects are equal.

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(
**localns:Any,
)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'backend':FieldInfo(annotation=Union[Literal['DEFAULT','UCX','NIXL','MPI'],NoneType],required=False,default=None,description='Thecommunicationbackendtypetouseforthecachetransceiver.'),'kv_transfer_sender_future_timeout_ms':FieldInfo(annotation=Union[int,NoneType],required=False,default=1000,description='Timeoutinmillisecondstowaitforthesenderfuturetobereadywhenscheduledbatchsizeis0.Thisallowstherequesttobeeventuallycancelledbytheuserorbecauseofkv_transfer_timeout_ms',metadata=[Gt(gt=0)]),'kv_transfer_timeout_ms':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='TimeoutinmillisecondsforKVcachetransfer.Requestsexceedingthistimeoutwillbecancelled.',metadata=[Gt(gt=0)]),'max_tokens_in_buffer':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Themaxnumberoftokensthetransferbuffercanfit.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.NGramDecodingConfig( *, max_draft_len:int|None=None, max_total_draft_tokens:int|None=None, speculative_model_dir:str|Path|None=None, max_concurrency:int|None=None, draft_len_schedule:dict[int,int]|None=None, load_format:str|None=None, acceptance_window:int|None=None, acceptance_length_threshold:float|None=None, max_matching_ngram_size:int=0, is_keep_all:bool=True, is_use_oldest:bool=True, is_public_pool:bool=True, )[source]#

Bases:DecodingBaseConfig

Configuration for NGram drafter speculative decoding.

Parameters:

max_draft_len – intThe length maximum of draft tokens (can be understood as length maximum of output draft tokens).
max_matching_ngram_size – intThe length maximum of searching tokens (can be understood as length maximum of input tokens to search).
is_keep_all – bool = TrueWhether to keep all candidate pattern-matches pairs, only one match is kept for each pattern if False.
is_use_oldest – bool = TrueWhether to provide the oldest match when pattern is hit, the newest one is provided if False.
is_public_pool – bool = TrueWhether to use a common pool for all requests, or the pool is private for each request if False.

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldis_keep_all:bool=True#

fieldis_public_pool:bool=True#

fieldis_use_oldest:bool=True#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_matching_ngram_size:int=0#

fieldmax_total_draft_tokens:int|None=None#

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='NGram'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'is_keep_all':FieldInfo(annotation=bool,required=False,default=True),'is_public_pool':FieldInfo(annotation=bool,required=False,default=True),'is_use_oldest':FieldInfo(annotation=bool,required=False,default=True),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_matching_ngram_size':FieldInfo(annotation=int,required=False,default=0),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

classtensorrt_llm.llmapi.UserProvidedDecodingConfig( *, max_draft_len:int|None=None, max_total_draft_tokens:int|None=None, speculative_model_dir:str|Path|None=None, max_concurrency:int|None=None, draft_len_schedule:dict[int,int]|None=None, load_format:str|None=None, acceptance_window:int|None=None, acceptance_length_threshold:float|None=None, drafter:object, resource_manager:object=None, )[source]#

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fielddrafter:object[Required]#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldresource_manager:object=None#

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='User_Provided'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'drafter':FieldInfo(annotation=object,required=True),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'resource_manager':FieldInfo(annotation=object,required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

classtensorrt_llm.llmapi.TorchCompileConfig( *, enable_fullgraph:bool=True, enable_inductor:bool=False, enable_piecewise_cuda_graph:bool=False, capture_num_tokens:List[int]|None=None, enable_userbuffers:bool=True, max_num_streams:int=1, )[source]#

Bases:StrictBaseModel

Configuration for torch.compile.

fieldcapture_num_tokens:List[int]|None=None#: List of num of tokens to capture the piecewise CUDA graph for. If not provided, the number of tokens will be the same as cuda_graph_config.batch_sizes.

fieldenable_fullgraph:bool=True#: Enable full graph compilation in torch.compile.

fieldenable_inductor:bool=False#: Enable inductor backend in torch.compile.

fieldenable_piecewise_cuda_graph:bool=False#: Enable piecewise CUDA graph in torch.compile.

fieldenable_userbuffers:bool=True#: When torch compile is enabled, userbuffers is enabled by default.

fieldmax_num_streams:int=1#: The maximum number of CUDA streams to use for torch.compile.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

validatorvalidate_capture_num_tokens » capture_num_tokens[source]#

validatorvalidate_torch_compile_max_num_streams » max_num_streams[source]#: Validate torch_compile_config.max_num_streams >= 1.

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'capture_num_tokens':FieldInfo(annotation=Union[List[int],NoneType],required=False,default=None,description='ListofnumoftokenstocapturethepiecewiseCUDAgraphfor.Ifnotprovided,thenumberoftokenswillbethesameascuda_graph_config.batch_sizes.'),'enable_fullgraph':FieldInfo(annotation=bool,required=False,default=True,description='Enablefullgraphcompilationintorch.compile.'),'enable_inductor':FieldInfo(annotation=bool,required=False,default=False,description='Enableinductorbackendintorch.compile.'),'enable_piecewise_cuda_graph':FieldInfo(annotation=bool,required=False,default=False,description='EnablepiecewiseCUDAgraphintorch.compile.'),'enable_userbuffers':FieldInfo(annotation=bool,required=False,default=True,description='Whentorchcompileisenabled,userbuffersisenabledbydefault.'),'max_num_streams':FieldInfo(annotation=int,required=False,default=1,description='ThemaximumnumberofCUDAstreamstousefortorch.compile.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='Draft_Target'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

tensorrt_llm.llmapi.LlmArgs#: alias ofTorchLlmArgs

classtensorrt_llm.llmapi.TorchLlmArgs( *, model:str|~pathlib.Path, tokenizer:str|~pathlib.Path|~transformers.tokenization_utils_base.PreTrainedTokenizerBase|~tensorrt_llm.llmapi.tokenizer.TokenizerBase|None=None, tokenizer_mode:~typing.Literal['auto', 'slow']='auto', skip_tokenizer_init:bool=False, trust_remote_code:bool=False, tensor_parallel_size:int=1, dtype:str='auto', revision:str|None=None, tokenizer_revision:str|None=None, pipeline_parallel_size:int=1, context_parallel_size:int=1, gpus_per_node:int|None=None, moe_cluster_parallel_size:int|None=None, moe_tensor_parallel_size:int|None=None, moe_expert_parallel_size:int|None=None, enable_attention_dp:bool=False, enable_lm_head_tp_in_adp:bool=False, pp_partition:~typing.List[int]|None=None, cp_config:dict|None=<factory>, load_format:str|~tensorrt_llm.llmapi.llm_args.LoadFormat=LoadFormat.AUTO, fail_fast_on_attention_window_too_large:bool=False, enable_lora:bool=False, lora_config:~tensorrt_llm.lora_helper.LoraConfig|None=None, kv_cache_config:~tensorrt_llm.llmapi.llm_args.KvCacheConfig=<factory>, enable_chunked_prefill:bool=False, guided_decoding_backend:~typing.Literal['xgrammar', 'llguidance']|None=None, batched_logits_processor:object|None=None, iter_stats_max_iterations:int|None=None, request_stats_max_iterations:int|None=None, peft_cache_config:~tensorrt_llm.llmapi.llm_args.PeftCacheConfig|None=None, scheduler_config:~tensorrt_llm.llmapi.llm_args.SchedulerConfig=<factory>, cache_transceiver_config:~tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig|None=None, sparse_attention_config:~tensorrt_llm.llmapi.llm_args.RocketSparseAttentionConfig|~tensorrt_llm.llmapi.llm_args.DeepSeekSparseAttentionConfig|None=None, speculative_config:~tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig|~tensorrt_llm.llmapi.llm_args.EagleDecodingConfig|~tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig|~tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig|~tensorrt_llm.llmapi.llm_args.MTPDecodingConfig|~tensorrt_llm.llmapi.llm_args.NGramDecodingConfig|~tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig|~tensorrt_llm.llmapi.llm_args.SaveHiddenStatesDecodingConfig|~tensorrt_llm.llmapi.llm_args.AutoDecodingConfig|None=None, max_batch_size:int|None=None, max_input_len:int|None=None, max_seq_len:int|None=None, max_beam_width:int|None=None, max_num_tokens:int|None=8192, gather_generation_logits:bool=False, num_postprocess_workers:int=0, postprocess_tokenizer_dir:str|None=None, reasoning_parser:str|None=None, decoding_config:object|None=None, _mpi_session:object|None=None, otlp_traces_endpoint:str|None=None, backend:str|None=None, return_perf_metrics:bool=False, orchestrator_type:~typing.Literal['rpc', 'ray']|None=None, build_config:~tensorrt_llm.builder.BuildConfig|None=None, garbage_collection_gen0_threshold:int=20000, cuda_graph_config:~tensorrt_llm.llmapi.llm_args.CudaGraphConfig|None=<factory>, attention_dp_config:~tensorrt_llm.llmapi.llm_args.AttentionDpConfig|None=None, disable_overlap_scheduler:bool=False, moe_config:~tensorrt_llm.llmapi.llm_args.MoeConfig=<factory>, attn_backend:str='TRTLLM', sampler_type:str|~tensorrt_llm.llmapi.llm_args.SamplerType=SamplerType.auto, enable_iter_perf_stats:bool=False, enable_iter_req_stats:bool=False, print_iter_log:bool=False, perf_metrics_max_requests:int=0, batch_wait_timeout_ms:float=0, batch_wait_timeout_iters:int=0, batch_wait_max_tokens_ratio:float=0, torch_compile_config:~tensorrt_llm.llmapi.llm_args.TorchCompileConfig|None=None, enable_autotuner:bool=True, enable_layerwise_nvtx_marker:bool=False, enable_min_latency:bool=False, stream_interval:int=1, force_dynamic_quantization:bool=False, allreduce_strategy:~typing.Literal['AUTO', 'NCCL', 'UB', 'MINLATENCY', 'ONESHOT', 'TWOSHOT', 'LOWPRECISION', 'MNNVL', 'NCCL_SYMMETRIC']|None='AUTO', checkpoint_loader:object|None=None, checkpoint_format:str|None=None, kv_connector_config:~tensorrt_llm.llmapi.llm_args.KvCacheConnectorConfig|None=None, mm_encoder_only:bool=False, ray_worker_extension_cls:str|None=None, enable_sleep:bool=False, )[source]#

Bases:BaseLlmArgs

fieldallreduce_strategy:Literal['AUTO','NCCL','UB','MINLATENCY','ONESHOT','TWOSHOT','LOWPRECISION','MNNVL','NCCL_SYMMETRIC']|None='AUTO'#: beta Allreduce strategy to use.

fieldattention_dp_config:AttentionDpConfig|None=None#: beta Optimized load-balancing for the DP Attention scheduler.

fieldattn_backend:str='TRTLLM'#: beta Attention backend to use.

fieldbackend:str|None=None#: deprecated The backend to use for this LLM instance.

fieldbatch_wait_max_tokens_ratio:float=0#: prototype Token accumulation threshold ratio for batch scheduling optimization. If greater than 0, the scheduler will accumulate requests locally until the total token count reaches batch_wait_max_tokens_ratio * max_num_tokens. This mechanism enhances GPU utilization efficiency by ensuring adequate batch sizes.If 0 disables token-based batching delays.

fieldbatch_wait_timeout_iters:int=0#: prototype Maximum number of iterations the scheduler will wait to accumulate new coming requests for improved GPU utilization efficiency. If greater than 0, the scheduler will delay batch processing to gather more requests up to the specified iteration limit. If 0, disables timeout-iters-based batching delays.

fieldbatch_wait_timeout_ms:float=0#: prototype If greater than 0, the request queue might wait up to batch_wait_timeout_ms to receive max_batch_size requests, if fewer than max_batch_size requests are currently available. If 0, no waiting occurs.

fieldbatched_logits_processor:object|None=None#: stable Batched logits processor.

fieldbuild_config:BuildConfig|None=None#: deprecated Build config.

fieldcache_transceiver_config:CacheTransceiverConfig|None=None#: prototype Cache transceiver config.

fieldcheckpoint_format:str|None=None#: prototype The format of the provided checkpoint. You may use a custom checkpoint format by subclassingBaseCheckpointLoader and registering it withregister_checkpoint_loader.If neither checkpoint_format nor checkpoint_loader are provided, checkpoint_format will be set to HF and the default HfCheckpointLoader will be used.If checkpoint_format and checkpoint_loader are both provided, checkpoint_loader will be ignored.

fieldcheckpoint_loader:object|None=None#: prototype The checkpoint loader to use for this LLM instance. You may use a custom checkpoint loader by subclassingBaseCheckpointLoader and providing an instance of the subclass here to load weights from a custom checkpoint format.If neither checkpoint_format nor checkpoint_loader are provided, checkpoint_format will be set to HF and the default HfCheckpointLoader will be used.If checkpoint_format and checkpoint_loader are both provided, checkpoint_loader will be ignored.

fieldcontext_parallel_size:int=1#: stable The context parallel size.

fieldcp_config:dict|None[Optional]#: prototype Context parallel config.

fieldcuda_graph_config:CudaGraphConfig|None[Optional]#: beta CUDA graph config.If true, use CUDA graphs for decoding. CUDA graphs are only created for the batch sizes in cuda_graph_config.batch_sizes, and are enabled for batches that consist of decoding requestsonly (the reason is that it’s hard to capture a single graph with prefill requests since the input shapes are a function of the sequence lengths). Note that each CUDA graph can use up to 200 MB of extra memory.

fielddisable_overlap_scheduler:bool=False#: beta Disable the overlap scheduler.

fielddtype:str='auto'#: stable The data type to use for the model.

fieldenable_attention_dp:bool=False#: beta Enable attention data parallel.

fieldenable_autotuner:bool=True#: prototype Enable autotuner for all tunable ops. This flag is for debugging purposes only, and the performance may significantly degrade if set to false.

fieldenable_chunked_prefill:bool=False#: stable Enable chunked prefill.

fieldenable_iter_perf_stats:bool=False#: prototype Enable iteration performance statistics.

fieldenable_iter_req_stats:bool=False#: prototype If true, enables per request stats per iteration. Must also set enable_iter_perf_stats to true to get request stats.

fieldenable_layerwise_nvtx_marker:bool=False#: beta If true, enable layerwise nvtx marker.

fieldenable_lm_head_tp_in_adp:bool=False#: prototype Enable LM head TP in attention dp.

fieldenable_lora:bool=False#: stable Enable LoRA.

fieldenable_min_latency:bool=False#: beta If true, enable min-latency mode. Currently only used for Llama4.

fieldenable_sleep:bool=False#: prototype Enable LLM sleep feature. Sleep feature requires extra setup that may slowdown model loading.Only enable it if you intend to use this feature.

fieldfail_fast_on_attention_window_too_large:bool=False#: prototype Fail fast when attention window is too large to fit even a single sequence in the KV cache.

fieldforce_dynamic_quantization:bool=False#: prototype If true, force dynamic quantization. Defaults to False.

fieldgarbage_collection_gen0_threshold:int=20000#: beta Threshold for Python garbage collection of generation 0 objects.Lower values trigger more frequent garbage collection.

fieldgather_generation_logits:bool=False#: prototype Gather generation logits.

fieldgpus_per_node:int|None=None#: beta The number of GPUs per node.

fieldguided_decoding_backend:Literal['xgrammar','llguidance']|None=None#: stable Guided decoding backend. llguidance is supported in PyTorch backend only.

fielditer_stats_max_iterations:int|None=None#: prototype The maximum number of iterations for iter stats.

fieldkv_cache_config:KvCacheConfig[Optional]#: stable KV cache config.

fieldkv_connector_config:KvCacheConnectorConfig|None=None#: prototype The config for KV cache connector.

fieldload_format:str|LoadFormat=LoadFormat.AUTO#: stable How to load the model weights. By default, detect the weight type from the model checkpoint.

fieldlora_config:LoraConfig|None=None#: stable LoRA configuration for the model.

fieldmax_batch_size:int|None=None#: stable The maximum batch size.

fieldmax_beam_width:int|None=None#: stable The maximum beam width.

fieldmax_input_len:int|None=None#: stable The maximum input length.

fieldmax_num_tokens:int|None=8192#: stable The maximum number of tokens.

fieldmax_seq_len:int|None=None#: stable The maximum sequence length.

fieldmm_encoder_only:bool=False#: prototype Only load/execute the vision encoder part of the full model. Defaults to False.

fieldmodel:str|Path[Required]#: stable The path to the model checkpoint or the model name from the Hugging Face Hub.

fieldmoe_cluster_parallel_size:int|None=None#: beta The cluster parallel size for MoE models’s expert weights.

fieldmoe_config:MoeConfig[Optional]#: beta MoE config.

fieldmoe_expert_parallel_size:int|None=None#: stable The expert parallel size for MoE models’s expert weights.

fieldmoe_tensor_parallel_size:int|None=None#: stable The tensor parallel size for MoE models’s expert weights.

fieldmpi_session:object|None=None(alias'_mpi_session')#: stable The optional MPI session to use for this LLM instance.

fieldnum_postprocess_workers:int=0#: prototype The number of processes used for postprocessing the generated tokens, including detokenization.

fieldorchestrator_type:Literal['rpc','ray']|None=None#: prototype The orchestrator type to use. Defaults to None, which uses MPI.

fieldotlp_traces_endpoint:str|None=None#: prototype Target URL to which OpenTelemetry traces will be sent.

fieldpeft_cache_config:PeftCacheConfig|None=None#: prototype PEFT cache config.

fieldperf_metrics_max_requests:int=0#: prototype The maximum number of requests for perf metrics. Must also set request_perf_metrics to true to get perf metrics.

fieldpipeline_parallel_size:int=1#: stable The pipeline parallel size.

fieldpostprocess_tokenizer_dir:str|None=None#: prototype The path to the tokenizer directory for postprocessing.

fieldpp_partition:List[int]|None=None#: prototype Pipeline parallel partition, a list of each rank’s layer number.

fieldprint_iter_log:bool=False#: beta Print iteration logs.

fieldray_worker_extension_cls:str|None=None#: prototype The full worker extension class name including module path.Allows users to extend the functions of the RayGPUWorker class.

fieldreasoning_parser:str|None=None#: prototype The parser to separate reasoning content from output.

fieldrequest_stats_max_iterations:int|None=None#: prototype The maximum number of iterations for request stats.

fieldreturn_perf_metrics:bool=False#: prototype Return perf metrics.

fieldrevision:str|None=None#: stable The revision to use for the model.

fieldsampler_type:str|SamplerType=SamplerType.auto#: beta The type of sampler to use. Options are TRTLLMSampler, TorchSampler or auto. Defaults to auto, which will use TorchSampler unless BeamSearch is requested.

fieldscheduler_config:SchedulerConfig[Optional]#: prototype Scheduler config.

fieldskip_tokenizer_init:bool=False#: stable Whether to skip the tokenizer initialization.

fieldsparse_attention_config:SparseAttentionConfig|None=None#: prototype Sparse attention config.

fieldspeculative_config:SpeculativeConfig=None#: stable Speculative decoding config.

fieldstream_interval:int=1#: stable The iteration interval to create responses under the streaming mode. Set this to a larger value when the batch size is large, which helps reduce the streaming overhead.

fieldtensor_parallel_size:int=1#: stable The tensor parallel size.

fieldtokenizer:str|Path|TokenizerBase|PreTrainedTokenizerBase|None=None#: stable The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub.

fieldtokenizer_mode:Literal['auto','slow']='auto'#: stable The mode to initialize the tokenizer.

fieldtokenizer_revision:str|None=None#: stable The revision to use for the tokenizer.

fieldtorch_compile_config:TorchCompileConfig|None=None#: prototype Torch compile config.

fieldtrust_remote_code:bool=False#: stable Whether to trust the remote code.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

validatorconvert_load_format » load_format[source]#

classmethodfrom_kwargs(
**kwargs:Any,
)→BaseLlmArgs#

CreateLlmArgs instance from kwargs.

Parameters:: kwargs (Any) – Arguments passed toLlmArgs constructor.
Returns:: TheBaseLlmArgs instance.
Return type:: tensorrt_llm.llmapi.llm_utils.BaseLlmArgs

get_executor_config( _hf_model_dir:Path|None=None, tokenizer:TokenizerBase|None=None, )→ExecutorConfig[source]#

get_runtime_sizes()→Tuple[int,int,int,int]#

validatorinit_backend » backend[source]#

validatorinit_build_config » allfields#: Creating a default BuildConfig if none is provided

validatorset_default_max_input_len » allfields#

validatorset_runtime_knobs_from_build_config » allfields#

validatorsync_quant_config_with_kv_cache_config_dtype » allfields[source]#

validatorvalidate_and_init_tokenizer » allfields#: Initialize tokenizer based on configuration.

validatorvalidate_attention_dp_config » allfields[source]#

Validate attention DP configuration.

Ensures that:1. If attention_dp_config.enable_balance is true, attention_dp_config.batching_wait_iters must be greater or equal to 02. If attention_dp_config.enable_balance is true, attention_dp_config.timeout_iters must be greater or equal to 0

validatorvalidate_batch_wait_max_tokens_ratio » allfields[source]#

validatorvalidate_batch_wait_timeout_iters » allfields[source]#

validatorvalidate_batch_wait_timeout_ms » allfields[source]#: Validate batch wait timeout.

validatorvalidate_build_config_remaining » allfields#

validatorvalidate_build_config_with_runtime_params » allfields#

validatorvalidate_checkpoint_format » allfields[source]#

validatorvalidate_cuda_graph_config » allfields[source]#

Validate CUDA graph configuration.

Ensures that:1. If cuda_graph_config.batch_sizes is provided, cuda_graph_config.max_batch_size must be 02. If cuda_graph_config.batch_sizes is not provided, it is generated based on cuda_graph_config.max_batch_size3. If both are provided, cuda_graph_config.batch_sizes must match the generated values

validatorvalidate_dtype » dtype#

validatorvalidate_gpus_per_node » gpus_per_node#

validatorvalidate_load_balancer » allfields[source]#

validatorvalidate_lora_config_consistency » allfields#

validatorvalidate_model » model#

validatorvalidate_model_format_misc » allfields#

Load the model format, and do the following:

Load the build_config if got an engine.
Load the parallel_config if got a checkpoint.

validatorvalidate_parallel_config » allfields#

validatorvalidate_peft_cache_config » allfields#

validatorvalidate_ray_worker_extension_cls » allfields[source]#

validatorvalidate_runtime_args » allfields#

validatorvalidate_speculative_config » allfields#

validatorvalidate_stream_interval » allfields[source]#

validatorvalidate_torch_compile_config » allfields[source]#

warn_on_unstable_feature_usage()→TorchLlmArgs[source]#: Warn on unstable feature usage.

decoding_config:object|None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, orNone.

field_name#: The name of the field being deprecated.

propertyextra_resource_managers:Dict[str,object]#

propertymodel_format:_ModelFormatKind#

propertyparallel_config:_ParallelConfig#

propertyquant_config:QuantConfig#

propertyspeculative_model_dir:_ModelFormatKind|None#

propertyspeculative_model_format:_ModelFormatKind#

classtensorrt_llm.llmapi.TrtLlmArgs( *, model:str|~pathlib.Path, tokenizer:str|~pathlib.Path|~transformers.tokenization_utils_base.PreTrainedTokenizerBase|~tensorrt_llm.llmapi.tokenizer.TokenizerBase|None=None, tokenizer_mode:~typing.Literal['auto', 'slow']='auto', skip_tokenizer_init:bool=False, trust_remote_code:bool=False, tensor_parallel_size:int=1, dtype:str='auto', revision:str|None=None, tokenizer_revision:str|None=None, pipeline_parallel_size:int=1, context_parallel_size:int=1, gpus_per_node:int|None=None, moe_cluster_parallel_size:int|None=None, moe_tensor_parallel_size:int|None=None, moe_expert_parallel_size:int|None=None, enable_attention_dp:bool=False, enable_lm_head_tp_in_adp:bool=False, pp_partition:~typing.List[int]|None=None, cp_config:dict|None=<factory>, load_format:~typing.Literal['auto', 'dummy']='auto', fail_fast_on_attention_window_too_large:bool=False, enable_lora:bool=False, lora_config:~tensorrt_llm.lora_helper.LoraConfig|None=None, kv_cache_config:~tensorrt_llm.llmapi.llm_args.KvCacheConfig=<factory>, enable_chunked_prefill:bool=False, guided_decoding_backend:~typing.Literal['xgrammar', 'llguidance']|None=None, batched_logits_processor:object|None=None, iter_stats_max_iterations:int|None=None, request_stats_max_iterations:int|None=None, peft_cache_config:~tensorrt_llm.llmapi.llm_args.PeftCacheConfig|None=None, scheduler_config:~tensorrt_llm.llmapi.llm_args.SchedulerConfig=<factory>, cache_transceiver_config:~tensorrt_llm.llmapi.llm_args.CacheTransceiverConfig|None=None, sparse_attention_config:~tensorrt_llm.llmapi.llm_args.RocketSparseAttentionConfig|~tensorrt_llm.llmapi.llm_args.DeepSeekSparseAttentionConfig|None=None, speculative_config:~tensorrt_llm.llmapi.llm_args.DraftTargetDecodingConfig|~tensorrt_llm.llmapi.llm_args.EagleDecodingConfig|~tensorrt_llm.llmapi.llm_args.LookaheadDecodingConfig|~tensorrt_llm.llmapi.llm_args.MedusaDecodingConfig|~tensorrt_llm.llmapi.llm_args.MTPDecodingConfig|~tensorrt_llm.llmapi.llm_args.NGramDecodingConfig|~tensorrt_llm.llmapi.llm_args.UserProvidedDecodingConfig|~tensorrt_llm.llmapi.llm_args.SaveHiddenStatesDecodingConfig|~tensorrt_llm.llmapi.llm_args.AutoDecodingConfig|None=None, max_batch_size:int|None=None, max_input_len:int|None=None, max_seq_len:int|None=None, max_beam_width:int|None=None, max_num_tokens:int|None=8192, gather_generation_logits:bool=False, num_postprocess_workers:int=0, postprocess_tokenizer_dir:str|None=None, reasoning_parser:str|None=None, decoding_config:object|None=None, _mpi_session:object|None=None, otlp_traces_endpoint:str|None=None, backend:str|None=None, return_perf_metrics:bool=False, orchestrator_type:~typing.Literal['rpc', 'ray']|None=None, enable_tqdm:bool=False, workspace:str|None=None, enable_build_cache:object=False, extended_runtime_perf_knob_config:~tensorrt_llm.llmapi.llm_args.ExtendedRuntimePerfKnobConfig|None=None, calib_config:~tensorrt_llm.llmapi.llm_args.CalibConfig|None=None, quant_config:~tensorrt_llm.models.modeling_utils.QuantConfig|None=None, embedding_parallel_mode:str='SHARDING_ALONG_VOCAB', fast_build:bool=False, build_config:~tensorrt_llm.builder.BuildConfig|None=None, enable_prompt_adapter:bool=False, max_prompt_adapter_token:int=0, batching_type:~tensorrt_llm.llmapi.llm_args.BatchingType|None=None, normalize_log_probs:bool=False, )[source]#

Bases:BaseLlmArgs

fieldbackend:str|None=None#: The backend to use for this LLM instance.

fieldbatched_logits_processor:object|None=None#: Batched logits processor.

fieldbatching_type:BatchingType|None=None#: Batching type.

fieldbuild_config:BuildConfig|None=None#: Build config.

fieldcache_transceiver_config:CacheTransceiverConfig|None=None#: Cache transceiver config.

fieldcalib_config:CalibConfig|None=None#: Calibration config.

fieldcontext_parallel_size:int=1#: The context parallel size.

fieldcp_config:dict|None[Optional]#: Context parallel config.

fielddtype:str='auto'#: The data type to use for the model.

fieldembedding_parallel_mode:str='SHARDING_ALONG_VOCAB'#: The embedding parallel mode.

fieldenable_attention_dp:bool=False#: Enable attention data parallel.

fieldenable_build_cache:object=False#: Enable build cache.

fieldenable_chunked_prefill:bool=False#: Enable chunked prefill.

fieldenable_lm_head_tp_in_adp:bool=False#: Enable LM head TP in attention dp.

fieldenable_lora:bool=False#: Enable LoRA.

fieldenable_prompt_adapter:bool=False#: Enable prompt adapter.

fieldenable_tqdm:bool=False#: Enable tqdm for progress bar.

fieldextended_runtime_perf_knob_config:ExtendedRuntimePerfKnobConfig|None=None#: Extended runtime perf knob config.

fieldfail_fast_on_attention_window_too_large:bool=False#: Fail fast when attention window is too large to fit even a single sequence in the KV cache.

fieldfast_build:bool=False#: Enable fast build.

fieldgather_generation_logits:bool=False#: Gather generation logits.

fieldgpus_per_node:int|None=None#: The number of GPUs per node.

fieldguided_decoding_backend:Literal['xgrammar','llguidance']|None=None#: Guided decoding backend. llguidance is supported in PyTorch backend only.

fielditer_stats_max_iterations:int|None=None#: The maximum number of iterations for iter stats.

fieldkv_cache_config:KvCacheConfig[Optional]#: KV cache config.

fieldload_format:Literal['auto','dummy']='auto'#: The format to load the model.

fieldlora_config:LoraConfig|None=None#: LoRA configuration for the model.

fieldmax_batch_size:int|None=None#: The maximum batch size.

fieldmax_beam_width:int|None=None#: The maximum beam width.

fieldmax_input_len:int|None=None#: The maximum input length.

fieldmax_num_tokens:int|None=8192#: The maximum number of tokens.

fieldmax_prompt_adapter_token:int=0#: The maximum number of prompt adapter tokens.

fieldmax_seq_len:int|None=None#: The maximum sequence length.

fieldmodel:str|Path[Required]#: The path to the model checkpoint or the model name from the Hugging Face Hub.

fieldmoe_cluster_parallel_size:int|None=None#: The cluster parallel size for MoE models’s expert weights.

fieldmoe_expert_parallel_size:int|None=None#: The expert parallel size for MoE models’s expert weights.

fieldmoe_tensor_parallel_size:int|None=None#: The tensor parallel size for MoE models’s expert weights.

fieldmpi_session:object|None=None(alias'_mpi_session')#: The optional MPI session to use for this LLM instance.

fieldnormalize_log_probs:bool=False#: Normalize log probabilities.

fieldnum_postprocess_workers:int=0#: The number of processes used for postprocessing the generated tokens, including detokenization.

fieldorchestrator_type:Literal['rpc','ray']|None=None#: The orchestrator type to use. Defaults to None, which uses MPI.

fieldotlp_traces_endpoint:str|None=None#: Target URL to which OpenTelemetry traces will be sent.

fieldpeft_cache_config:PeftCacheConfig|None=None#: PEFT cache config.

fieldpipeline_parallel_size:int=1#: The pipeline parallel size.

fieldpostprocess_tokenizer_dir:str|None=None#: The path to the tokenizer directory for postprocessing.

fieldpp_partition:List[int]|None=None#: Pipeline parallel partition, a list of each rank’s layer number.

fieldquant_config:QuantConfig|None=None#: Quantization config.

fieldreasoning_parser:str|None=None#: The parser to separate reasoning content from output.

fieldrequest_stats_max_iterations:int|None=None#: The maximum number of iterations for request stats.

fieldreturn_perf_metrics:bool=False#: Return perf metrics.

fieldrevision:str|None=None#: The revision to use for the model.

fieldscheduler_config:SchedulerConfig[Optional]#: Scheduler config.

fieldskip_tokenizer_init:bool=False#: Whether to skip the tokenizer initialization.

fieldsparse_attention_config:SparseAttentionConfig|None=None#: Sparse attention config.

fieldspeculative_config:SpeculativeConfig=None#: Speculative decoding config.

fieldtensor_parallel_size:int=1#: The tensor parallel size.

fieldtokenizer:str|Path|TokenizerBase|PreTrainedTokenizerBase|None=None#: The path to the tokenizer checkpoint or the tokenizer name from the Hugging Face Hub.

fieldtokenizer_mode:Literal['auto','slow']='auto'#: The mode to initialize the tokenizer.

fieldtokenizer_revision:str|None=None#: The revision to use for the tokenizer.

fieldtrust_remote_code:bool=False#: Whether to trust the remote code.

fieldworkspace:str|None=None#: The workspace for the model.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodfrom_kwargs(
**kwargs:Any,
)→BaseLlmArgs#

CreateLlmArgs instance from kwargs.

Parameters:: kwargs (Any) – Arguments passed toLlmArgs constructor.
Returns:: TheBaseLlmArgs instance.
Return type:: tensorrt_llm.llmapi.llm_utils.BaseLlmArgs

get_runtime_sizes()→Tuple[int,int,int,int]#

validatorinit_build_config » allfields#: Creating a default BuildConfig if none is provided

validatorinit_calib_config » calib_config[source]#

validatorset_default_max_input_len » allfields#

validatorset_runtime_knobs_from_build_config » allfields#

validatorsetup_embedding_parallel_mode » allfields[source]#

validatorvalidate_and_init_tokenizer » allfields#: Initialize tokenizer based on configuration.

validatorvalidate_build_config_remaining » allfields#

validatorvalidate_build_config_with_runtime_params » allfields#

validatorvalidate_dtype » dtype#

validatorvalidate_enable_build_cache » allfields[source]#

validatorvalidate_gpus_per_node » gpus_per_node#

validatorvalidate_kv_cache_dtype » allfields[source]#

validatorvalidate_lora_config_consistency » allfields#

validatorvalidate_model » model#

validatorvalidate_model_format_misc » allfields#

Load the model format, and do the following:

Load the build_config if got an engine.
Load the parallel_config if got a checkpoint.

validatorvalidate_parallel_config » allfields#

validatorvalidate_peft_cache_config » allfields#

validatorvalidate_quant_config » quant_config[source]#

validatorvalidate_runtime_args » allfields#

validatorvalidate_speculative_config » allfields#

decoding_config:object|None#

Read-only data descriptor used to emit a runtime deprecation warning before accessing a deprecated field.

msg#: The deprecation message to be emitted.

wrapped_property#: The property instance if the deprecated field is a computed field, orNone.

field_name#: The name of the field being deprecated.

propertymodel_format:_ModelFormatKind#

propertyparallel_config:_ParallelConfig#

propertyspeculative_model_dir:_ModelFormatKind|None#

propertyspeculative_model_format:_ModelFormatKind#

Bases:DecodingBaseConfig

Configuration for auto speculative decoding.

This config will automatically select a good, draft-model freespeculation algorithm with some heuristic.

Attributes that are inherited from the base class are ignored.

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=None#

fieldspeculative_model_dir:str|Path|None=None#

classConfig#

Bases:object

extra='forbid'#

__init__(**kwargs)[source]#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(**localns:Any)→None#

validate()→None#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='AUTO'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertyspec_dec_mode#

classtensorrt_llm.llmapi.AttentionDpConfig( *, enable_balance:bool=False, timeout_iters:int=50, batching_wait_iters:int=10, )[source]#

Bases:StrictBaseModel

Configuration for attention DP.

fieldbatching_wait_iters:int=10#: The number of iterations to wait for batching.

fieldenable_balance:bool=False#: Whether to enable balance.

fieldtimeout_iters:int=50#: The number of iterations to timeout.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init(context:Any,/)→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj(obj:Any)→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

classmethodupdate_forward_refs(**localns:Any)→None#

classmethodvalidate(value:Any)→Self#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'batching_wait_iters':FieldInfo(annotation=int,required=False,default=10,description='Thenumberofiterationstowaitforbatching.'),'enable_balance':FieldInfo(annotation=bool,required=False,default=False,description='Whethertoenablebalance.'),'timeout_iters':FieldInfo(annotation=int,required=False,default=50,description='Thenumberofiterationstotimeout.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.LoRARequest( lora_name:str, lora_int_id:int, lora_path:str='', lora_ckpt_source:str='hf', )[source]#

Bases:object

Request for a LoRA adapter.

__init__( lora_name:str, lora_int_id:int, lora_path:str='', lora_ckpt_source:str='hf', )→None#

propertyadapter_id#

propertyckpt_source#

lora_ckpt_source:str#

lora_int_id:int#

lora_name:str#

lora_path:str#

propertyname#

propertypath#

classtensorrt_llm.llmapi.SaveHiddenStatesDecodingConfig( *, max_draft_len:int|None=None, max_total_draft_tokens:int|None=1, speculative_model_dir:str|Path|None=None, max_concurrency:int|None=None, draft_len_schedule:dict[int,int]|None=None, load_format:str|None=None, acceptance_window:int|None=None, acceptance_length_threshold:float|None=None, output_directory:str, write_interval:int=20, file_prefix:str='data', eagle3_layers_to_capture:Set[int]|None=None, eagle_choices:List[List[int]]|None=None, )[source]#

Bases:DecodingBaseConfig

fieldacceptance_length_threshold:float|None=None#

fieldacceptance_window:int|None=None#

fielddraft_len_schedule:dict[int,int]|None=None#

fieldeagle3_layers_to_capture:Set[int]|None=None#

fieldeagle_choices:List[List[int]]|None=None#

fieldfile_prefix:str='data'#

fieldload_format:str|None=None#

fieldmax_concurrency:int|None=None#

fieldmax_draft_len:int|None=None#

fieldmax_total_draft_tokens:int|None=1#

fieldoutput_directory:str[Required]#

fieldspeculative_model_dir:str|Path|None=None#

fieldwrite_interval:int=20#

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm( obj:Any, )→Self#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( _SaveHiddenStatesDecodingConfig__context, )[source]#

This function is meant to behave like a BaseModel method to initialise private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self – The BaseModel instance.
context – The context.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj( obj:Any, )→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

validate()→None[source]#: Do any additional error checking here.

validatorvalidate_draft_len_schedule_and_sort » draft_len_schedule#: Validate and sort draft_len_schedule by batch size thresholds.

decoding_type:ClassVar[str]='SaveState'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'acceptance_length_threshold':FieldInfo(annotation=Union[float,NoneType],required=False,default=None),'acceptance_window':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'draft_len_schedule':FieldInfo(annotation=Union[dict[int,int],NoneType],required=False,default=None),'eagle3_layers_to_capture':FieldInfo(annotation=Union[Set[int],NoneType],required=False,default=None),'eagle_choices':FieldInfo(annotation=Union[List[List[int]],NoneType],required=False,default=None,init=False),'file_prefix':FieldInfo(annotation=str,required=False,default='data'),'load_format':FieldInfo(annotation=Union[str,NoneType],required=False,default=None),'max_concurrency':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_draft_len':FieldInfo(annotation=Union[int,NoneType],required=False,default=None),'max_total_draft_tokens':FieldInfo(annotation=Union[int,NoneType],required=False,default=1,init=False),'output_directory':FieldInfo(annotation=str,required=True),'speculative_model_dir':FieldInfo(annotation=Union[str,Path,NoneType],required=False,default=None),'write_interval':FieldInfo(annotation=int,required=False,default=20)}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

propertynum_capture_layers#: Returns the number of layers to capture of the target model.If eagle3_layers_to_capture is not None, return the length of the set.Otherwise, assume Eagle3 base set and return 3 + 1 (for post norm last hidden state).

propertyspec_dec_mode#

Bases:BaseSparseAttentionConfig

Configuration for RocketKV sparse attention.

fieldkernel_size:int|None=None#: The kernel size for snap KV.

fieldpage_size:int|None=3#: Page size

fieldprompt_budget:int|None=1266#: Prompt budget

fieldtopk:int|None=128#: Top-k

fieldtopr:int|float|None=76#: Top-r

fieldwindow_size:int|None=None#: The window size for snap KV.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm(obj:Any)→Self#

get_indices_block_size()→int[source]#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context:Any, /, )→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj( obj:Any, )→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

classmethodvalidate( value:Any, )→Self#

algorithm:ClassVar[str]='rocket'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'kernel_size':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='ThekernelsizeforsnapKV.'),'page_size':FieldInfo(annotation=Union[int,NoneType],required=False,default=3,description='Pagesize'),'prompt_budget':FieldInfo(annotation=Union[int,NoneType],required=False,default=1266,description='Promptbudget'),'topk':FieldInfo(annotation=Union[int,NoneType],required=False,default=128,description='Top-k'),'topr':FieldInfo(annotation=Union[int,float,NoneType],required=False,default=76,description='Top-r'),'window_size':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='ThewindowsizeforsnapKV.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

classtensorrt_llm.llmapi.DeepSeekSparseAttentionConfig( *, index_n_heads:int|None=None, index_head_dim:int|None=None, index_topk:int|None=None, indexer_max_chunk_size:int|None=None, )[source]#

Bases:BaseSparseAttentionConfig

Configuration for DeepSeek Sparse Attention.

fieldindex_head_dim:int|None=None#: The dimension of the indexer heads.

fieldindex_n_heads:int|None=None#: The number of heads for the indexer.

fieldindex_topk:int|None=None#: The topk for the indexer.

fieldindexer_max_chunk_size:int|None=None#: The maximum chunk size for the indexer.

classConfig#

Bases:object

extra='forbid'#

__init__(**data:Any)→None#

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot bevalidated to form a valid model.

self is explicitly positional-only to allowself as a field name.

classmethodconstruct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Returns a copy of the model.

!!! warning “Deprecated”: This method is now deprecated; usemodel_copy instead.

If you needinclude orexclude, use:

`python{test="skip"lint="skip"}data=self.model_dump(include=include,exclude=exclude,round_trip=True)data={**data,**(updateor{})}copied=self.model_validate(data)`

Parameters:

include – Optional set or mapping specifying which fields to include in the copied model.
exclude – Optional set or mapping specifying which fields to exclude in the copied model.
update – Optional dictionary of field-value pairs to override field values in the copied model.
deep – If True, the values of fields that are Pydantic models will be deep-copied.

Returns:

A copy of the model with included, excluded and updated fields as specified.

dict( *, include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None, by_alias:bool=False, exclude_unset:bool=False, exclude_defaults:bool=False, exclude_none:bool=False, )→Dict[str,Any]#

classmethodfrom_dict(data:dict)[source]#

classmethodfrom_orm( obj:Any, )→Self#

get_indices_block_size()→int#

json(
*,
include:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
exclude:set[int]|set[str]|Mapping[int,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|Mapping[str,set[int]|set[str]|Mapping[int,IncEx|bool]|Mapping[str,IncEx|bool]|bool]|None=None,
by_alias:bool=False,
exclude_unset:bool=False,
exclude_defaults:bool=False,
exclude_none:bool=False,
encoder:Callable[[Any],Any]|None=PydanticUndefined,
models_as_dict:bool=PydanticUndefined,
**dumps_kwargs:Any,
)→str#

classmethodmodel_construct(
_fields_set:set[str]|None=None,
**values:Any,
)→Self#

Creates a new instance of theModel class with validated data.

Creates a new model setting__dict__ and__pydantic_fields_set__ from trusted or pre-validated data.Default values are respected, but no other validation is performed.

!!! note: model_construct() generally respects themodel_config.extra setting on the provided model.That is, ifmodel_config.extra == ‘allow’, then all extra passed values are added to the model instance’s__dict__and__pydantic_extra__ fields. Ifmodel_config.extra == ‘ignore’ (the default), then all extra passed values are ignored.Because no validation is performed with a call tomodel_construct(), havingmodel_config.extra == ‘forbid’ does not result inan error if extra values are passed, but they will be ignored.

Parameters:

_fields_set – A set of field names that were originally explicitly set during instantiation. If provided,this is directly used for the [model_fields_set][pydantic.BaseModel.model_fields_set] attribute.Otherwise, the field names from thevalues argument will be used.
values – Trusted or pre-validated data dictionary.

Returns:

A new instance of theModel class with validated data.

model_copy( *, update:Mapping[str,Any]|None=None, deep:bool=False, )→Self#

!!! abstract “Usage Documentation”: [model_copy](../concepts/serialization.md#model_copy)

Returns a copy of the model.

!!! note: The underlying instance’s [__dict__][object.__dict__] attribute is copied. Thismight have unexpected side effects if you store anything in it, on top of the modelfields (e.g. the value of [cached properties][functools.cached_property]).

Parameters:

update – Values to change/add in the new model. Note: the data is not validatedbefore creating the new model. You should trust this data.
deep – Set toTrue to make a deep copy of the model.

Returns:

New model instance.

!!! abstract “Usage Documentation”: [model_dump](../concepts/serialization.md#modelmodel_dump)

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

Parameters:

mode – The mode in whichto_python should run.If mode is ‘json’, the output will only contain JSON serializable types.If mode is ‘python’, the output may contain non-JSON-serializable Python objects.
include – A set of fields to include in the output.
exclude – A set of fields to exclude from the output.
context – Additional context to pass to the serializer.
by_alias – Whether to use the field’s alias in the dictionary key if defined.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A dictionary representation of the model.

!!! abstract “Usage Documentation”: [model_dump_json](../concepts/serialization.md#modelmodel_dump_json)

Generates a JSON representation of the model using Pydantic’sto_json method.

Parameters:

indent – Indentation to use in the JSON output. If None is passed, the output will be compact.
include – Field(s) to include in the JSON output.
exclude – Field(s) to exclude from the JSON output.
context – Additional context to pass to the serializer.
by_alias – Whether to serialize using field aliases.
exclude_unset – Whether to exclude fields that have not been explicitly set.
exclude_defaults – Whether to exclude fields that are set to their default value.
exclude_none – Whether to exclude fields that have a value ofNone.
round_trip – If True, dumped values should be valid as input for non-idempotent types such as Json[T].
warnings – How to handle serialization errors. False/”none” ignores them, True/”warn” logs errors,“error” raises a [PydanticSerializationError][pydantic_core.PydanticSerializationError].
fallback – A function to call when an unknown value is encountered. If not provided,a [PydanticSerializationError][pydantic_core.PydanticSerializationError] error is raised.
serialize_as_any – Whether to serialize fields with duck-typing serialization behavior.

Returns:

A JSON string representation of the model.

Generates a JSON schema for a model class.

Parameters:

by_alias – Whether to use attribute aliases or not.
ref_template – The reference template.
schema_generator – To override the logic used to generate the JSON schema, as a subclass ofGenerateJsonSchema with your desired modifications
mode – The mode in which to generate the schema.

Returns:

The JSON schema for the given model class.

classmethodmodel_parametrized_name( params:tuple[type[Any],...], )→str#

Compute the class name for parametrizations of generic classes.

This method can be overridden to achieve a custom naming scheme for generic BaseModels.

Parameters:: params – Tuple of types of the class. Given a generic classModel with 2 type variables and a concrete modelModel[str, int],the value(str, int) would be passed toparams.
Returns:: String representing the new class whereparams are passed tocls as type variables.
Raises:: TypeError – Raised when trying to generate concrete names for non-generic models.

model_post_init( context:Any, /, )→None#: Override this method to perform additional initialization after__init__ andmodel_construct.This is useful if you want to do some validation that requires the entire model to be initialized.

classmethodmodel_rebuild( *, force:bool=False, raise_errors:bool=True, _parent_namespace_depth:int=2, _types_namespace:MappingNamespace|None=None, )→bool|None#

Try to rebuild the pydantic-core schema for the model.

This may be necessary when one of the annotations is a ForwardRef which could not be resolved duringthe initial attempt to build the schema, and automatic rebuilding fails.

Parameters:

force – Whether to force the rebuilding of the model schema, defaults toFalse.
raise_errors – Whether to raise errors, defaults toTrue.
_parent_namespace_depth – The depth level of the parent namespace, defaults to 2.
_types_namespace – The types namespace, defaults toNone.

Returns:

ReturnsNone if the schema is already “complete” and rebuilding was not required.If rebuilding _was_ required, returnsTrue if rebuilding was successful, otherwiseFalse.

Validate a pydantic model instance.

Parameters:

obj – The object to validate.
strict – Whether to enforce types strictly.
from_attributes – Whether to extract data from object attributes.
context – Additional context to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Raises:

ValidationError – If the object could not be validated.

Returns:

The validated model instance.

!!! abstract “Usage Documentation”: [JSON Parsing](../concepts/json.md#json-parsing)

Validate the given JSON data against the Pydantic model.

Parameters:

json_data – The JSON data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

Raises:

ValidationError – Ifjson_data is not a JSON string or the object could not be validated.

classmethodmodel_validate_strings( obj:Any, *, strict:bool|None=None, context:Any|None=None, by_alias:bool|None=None, by_name:bool|None=None, )→Self#

Validate the given object with string data against the Pydantic model.

Parameters:

obj – The object containing string data to validate.
strict – Whether to enforce types strictly.
context – Extra variables to pass to the validator.
by_alias – Whether to use the field’s alias when validating against the provided input data.
by_name – Whether to use the field’s name when validating against the provided input data.

Returns:

The validated Pydantic model.

classmethodparse_file( path:str|Path, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodparse_obj( obj:Any, )→Self#

classmethodparse_raw( b:str|bytes, *, content_type:str|None=None, encoding:str='utf8', proto:DeprecatedParseProtocol|None=None, allow_pickle:bool=False, )→Self#

classmethodschema( by_alias:bool=True, ref_template:str='#/$defs/{model}', )→Dict[str,Any]#

classmethodschema_json(
*,
by_alias:bool=True,
ref_template:str='#/$defs/{model}',
**dumps_kwargs:Any,
)→str#

supports_backend(backend:str)→bool[source]#: Override if the speculation algorithm does not supporta subset of the possible backends.

classmethodupdate_forward_refs(
**localns:Any,
)→None#

classmethodvalidate( value:Any, )→Self#

algorithm:ClassVar[str]='dsa'#

model_computed_fields={}#

model_config:ClassVar[ConfigDict]={'extra':'forbid'}#: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

propertymodel_extra:dict[str,Any]|None#

Get extra fields set during validation.

Returns:: A dictionary of extra fields, orNone ifconfig.extra is not set to“allow”.

model_fields={'index_head_dim':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Thedimensionoftheindexerheads.'),'index_n_heads':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Thenumberofheadsfortheindexer.'),'index_topk':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Thetopkfortheindexer.'),'indexer_max_chunk_size':FieldInfo(annotation=Union[int,NoneType],required=False,default=None,description='Themaximumchunksizefortheindexer.')}#

propertymodel_fields_set:set[str]#

Returns the set of fields that have been explicitly set on this model instance.

Returns:

A set of strings representing the fields that have been set,: i.e. that were not filled from defaults.

On this page

Movatterモバイル変換

API Reference#

API Reference #