Quantization API Reference#
Created On: Jul 25, 2020 | Last Updated On: Jun 18, 2025
torch.ao.quantization#
This module contains Eager mode quantization APIs.
Top level APIs#
quantize | Quantize the input float model with post training static quantization. |
quantize_dynamic | Converts a float model to dynamic (i.e. |
quantize_qat | Do quantization aware training and output a quantized model |
prepare | Prepares a copy of the model for quantization calibration or quantization-aware training. |
prepare_qat | Prepares a copy of the model for quantization calibration or quantization-aware training and converts it to quantized version. |
convert | Converts submodules in input module to a different module according tomapping by callingfrom_float method on the target module class. |
Preparing model for quantization#
Fuse a list of modules into a single module. | |
QuantStub | Quantize stub module, before calibration, this is same as an observer, it will be swapped asnnq.Quantize inconvert. |
DeQuantStub | Dequantize stub module, before calibration, this is same as identity, this will be swapped asnnq.DeQuantize inconvert. |
QuantWrapper | A wrapper class that wraps the input module, adds QuantStub and DeQuantStub and surround the call to module with call to quant and dequant modules. |
add_quant_dequant | Wrap the leaf child module in QuantWrapper if it has a valid qconfig Note that this function will modify the children of module inplace and it can return a new module which wraps the input module as well. |
Utility functions#
swap_module | Swaps the module if it has a quantized counterpart and it has anobserver attached. |
propagate_qconfig_ | Propagate qconfig through the module hierarchy and assignqconfig attribute on each leaf module |
default_eval_fn | Define the default evaluation function. |
torch.ao.quantization.quantize_fx#
This module contains FX graph mode quantization APIs (prototype).
prepare_fx | Prepare a model for post training quantization |
prepare_qat_fx | Prepare a model for quantization aware training |
convert_fx | Convert a calibrated or trained model to a quantized model |
fuse_fx | Fuse modules like conv+bn, conv+bn+relu etc, model must be in eval mode. |
torch.ao.quantization.qconfig_mapping#
This module contains QConfigMapping for configuring FX graph mode quantization.
QConfigMapping | Mapping from model ops to |
get_default_qconfig_mapping | Return the default QConfigMapping for post training quantization. |
get_default_qat_qconfig_mapping | Return the default QConfigMapping for quantization aware training. |
torch.ao.quantization.backend_config#
This module contains BackendConfig, a config object that defines how quantization is supportedin a backend. Currently only used by FX Graph Mode Quantization, but we may extend Eager ModeQuantization to work with this as well.
BackendConfig | Config that defines the set of patterns that can be quantized on a given backend, and how reference quantized models can be produced from these patterns. |
BackendPatternConfig | Config object that specifies quantization behavior for a given operator pattern. |
DTypeConfig | Config object that specifies the supported data types passed as arguments to quantize ops in the reference model spec, for input and output activations, weights, and biases. |
DTypeWithConstraints | Config for specifying additional constraints for a given dtype, such as quantization value ranges, scale value ranges, and fixed quantization params, to be used in |
ObservationType | An enum that represents different ways of how an operator/operator pattern should be observed |
torch.ao.quantization.fx.custom_config#
This module contains a few CustomConfig classes that’s used in both eager mode and FX graph mode quantization
FuseCustomConfig | Custom configuration for |
PrepareCustomConfig | Custom configuration for |
ConvertCustomConfig | Custom configuration for |
StandaloneModuleConfigEntry |
torch.ao.quantization.quantizer#
torch.ao.quantization.pt2e (quantization in pytorch 2.0 export implementation)#
torch.ao.quantization.pt2e.export_utils#
model_is_exported | Return True if thetorch.nn.Module was exported, False otherwise (e.g. |
torch.ao.quantization.pt2e.lowering#
lower_pt2e_quantized_to_x86 | Lower a PT2E-qantized model to x86 backend. |
PT2 Export (pt2e) Numeric Debugger#
generate_numeric_debug_handle | Attach numeric_debug_handle_id for all nodes in the graph module of the given ExportedProgram, like conv2d, squeeze, conv1d, etc, except for placeholder. |
CUSTOM_KEY | str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
NUMERIC_DEBUG_HANDLE_KEY | str(object='') -> str str(bytes_or_buffer[, encoding[, errors]]) -> str |
prepare_for_propagation_comparison | Add output loggers to node that has numeric_debug_handle |
extract_results_from_loggers | For a given model, extract the tensors stats and related information for each debug handle. |
compare_results | Given two dict mapping fromdebug_handle_id (int) to list of tensors return a map fromdebug_handle_id toNodeAccuracySummary that contains comparison information like SQNR, MSE etc. |
torch (quantization related functions)#
This describes the quantization related functions of thetorch namespace.
quantize_per_tensor | Converts a float tensor to a quantized tensor with given scale and zero point. |
quantize_per_channel | Converts a float tensor to a per-channel quantized tensor with given scales and zero points. |
dequantize | Returns an fp32 Tensor by dequantizing a quantized Tensor |
torch.Tensor (quantization related methods)#
Quantized Tensors support a limited subset of data manipulation methods of theregular full-precision tensor.
view | Returns a new tensor with the same data as the |
as_strided | |
expand | Returns a new view of the |
flatten | |
select | |
ne | See |
eq | See |
ge | See |
le | See |
gt | See |
lt | See |
copy_ | Copies the elements from |
clone | |
dequantize | Given a quantized Tensor, dequantize it and return the dequantized float Tensor. |
equal | |
int_repr | Given a quantized Tensor, |
max | See |
mean | See |
min | See |
q_scale | Given a Tensor quantized by linear(affine) quantization, returns the scale of the underlying quantizer(). |
q_zero_point | Given a Tensor quantized by linear(affine) quantization, returns the zero_point of the underlying quantizer(). |
q_per_channel_scales | Given a Tensor quantized by linear (affine) per-channel quantization, returns a Tensor of scales of the underlying quantizer. |
q_per_channel_zero_points | Given a Tensor quantized by linear (affine) per-channel quantization, returns a tensor of zero_points of the underlying quantizer. |
q_per_channel_axis | Given a Tensor quantized by linear (affine) per-channel quantization, returns the index of dimension on which per-channel quantization is applied. |
resize_ | Resizes |
sort | See |
topk | See |
torch.ao.quantization.observer#
This module contains observers which are used to collect statistics aboutthe values observed during calibration (PTQ) or training (QAT).
ObserverBase | Base observer Module. |
MinMaxObserver | Observer module for computing the quantization parameters based on the running min and max values. |
MovingAverageMinMaxObserver | Observer module for computing the quantization parameters based on the moving average of the min and max values. |
PerChannelMinMaxObserver | Observer module for computing the quantization parameters based on the running per channel min and max values. |
MovingAveragePerChannelMinMaxObserver | Observer module for computing the quantization parameters based on the running per channel min and max values. |
HistogramObserver | The module records the running histogram of tensor values along with min/max values. |
PlaceholderObserver | Observer that doesn't do anything and just passes its configuration to the quantized module's |
RecordingObserver | The module is mainly for debug and records the tensor values during runtime. |
NoopObserver | Observer that doesn't do anything and just passes its configuration to the quantized module's |
get_observer_state_dict | Returns the state dict corresponding to the observer stats. |
load_observer_state_dict | Given input model and a state_dict containing model observer stats, load the stats back into the model. |
default_observer | Default observer for static quantization, usually used for debugging. |
default_placeholder_observer | Default placeholder observer, usually used for quantization to torch.float16. |
default_debug_observer | Default debug-only observer. |
default_weight_observer | Default weight observer. |
default_histogram_observer | Default histogram observer, usually used for PTQ. |
default_per_channel_weight_observer | Default per-channel weight observer, usually used on backends where per-channel weight quantization is supported, such asfbgemm. |
default_dynamic_quant_observer | Default observer for dynamic quantization. |
default_float_qparams_observer | Default observer for a floating point zero-point. |
AffineQuantizedObserverBase | Observer module for affine quantization (pytorch/ao) |
Granularity | Base class for representing the granularity of quantization. |
MappingType | How floating point number is mapped to integer number |
PerAxis | Represents per-axis granularity in quantization. |
PerBlock | Represents per-block granularity in quantization. |
PerGroup | Represents per-channel group granularity in quantization. |
PerRow | Represents row-wise granularity in quantization. |
PerTensor | Represents per-tensor granularity in quantization. |
PerToken | Represents per-token granularity in quantization. |
TorchAODType | Placeholder for dtypes that do not exist in PyTorch core yet. |
ZeroPointDomain | Enum that indicate whether zero_point is in integer domain or floating point domain |
get_block_size | Get the block size based on the input shape and granularity type. |
torch.ao.quantization.fake_quantize#
This module implements modules which are used to perform fake quantizationduring QAT.
FakeQuantizeBase | Base fake quantize module. |
FakeQuantize | Simulate the quantize and dequantize operations in training time. |
FixedQParamsFakeQuantize | Simulate quantize and dequantize in training time. |
FusedMovingAvgObsFakeQuantize | Define a fused module to observe the tensor. |
default_fake_quant | Default fake_quant for activations. |
default_weight_fake_quant | Default fake_quant for weights. |
default_per_channel_weight_fake_quant | Default fake_quant for per-channel weights. |
default_histogram_fake_quant | Fake_quant for activations using a histogram.. |
default_fused_act_fake_quant | Fused version ofdefault_fake_quant, with improved performance. |
default_fused_wt_fake_quant | Fused version ofdefault_weight_fake_quant, with improved performance. |
default_fused_per_channel_wt_fake_quant | Fused version ofdefault_per_channel_weight_fake_quant, with improved performance. |
disable_fake_quant | Disable fake quantization for the module. |
enable_fake_quant | Enable fake quantization for the module. |
disable_observer | Disable observation for this module. |
enable_observer | Enable observation for this module. |
torch.ao.quantization.qconfig#
This module definesQConfig objects which are usedto configure quantization settings for individual ops.
QConfig | Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively. |
default_qconfig | Default qconfig configuration. |
default_debug_qconfig | Default qconfig configuration for debugging. |
default_per_channel_qconfig | Default qconfig configuration for per channel weight quantization. |
default_dynamic_qconfig | Default dynamic qconfig. |
float16_dynamic_qconfig | Dynamic qconfig with weights quantized totorch.float16. |
float16_static_qconfig | Dynamic qconfig with both activations and weights quantized totorch.float16. |
per_channel_dynamic_qconfig | Dynamic qconfig with weights quantized per channel. |
float_qparams_weight_only_qconfig | Dynamic qconfig with weights quantized with a floating point zero_point. |
default_qat_qconfig | Default qconfig for QAT. |
default_weight_only_qconfig | Default qconfig for quantizing weights only. |
default_activation_only_qconfig | Default qconfig for quantizing activations only. |
default_qat_qconfig_v2 | Fused version ofdefault_qat_config, has performance benefits. |
torch.ao.nn.intrinsic#
This module implements the combined (fused) modules conv + relu which canthen be quantized.
ConvReLU1d | This is a sequential container which calls the Conv1d and ReLU modules. |
ConvReLU2d | This is a sequential container which calls the Conv2d and ReLU modules. |
ConvReLU3d | This is a sequential container which calls the Conv3d and ReLU modules. |
LinearReLU | This is a sequential container which calls the Linear and ReLU modules. |
ConvBn1d | This is a sequential container which calls the Conv 1d and Batch Norm 1d modules. |
ConvBn2d | This is a sequential container which calls the Conv 2d and Batch Norm 2d modules. |
ConvBn3d | This is a sequential container which calls the Conv 3d and Batch Norm 3d modules. |
ConvBnReLU1d | This is a sequential container which calls the Conv 1d, Batch Norm 1d, and ReLU modules. |
ConvBnReLU2d | This is a sequential container which calls the Conv 2d, Batch Norm 2d, and ReLU modules. |
ConvBnReLU3d | This is a sequential container which calls the Conv 3d, Batch Norm 3d, and ReLU modules. |
BNReLU2d | This is a sequential container which calls the BatchNorm 2d and ReLU modules. |
BNReLU3d | This is a sequential container which calls the BatchNorm 3d and ReLU modules. |
torch.ao.nn.intrinsic.qat#
This module implements the versions of those fused operations needed forquantization aware training.
LinearReLU | A LinearReLU module fused from Linear and ReLU modules, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvBn1d | A ConvBn1d module is a module fused from Conv1d and BatchNorm1d, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvBnReLU1d | A ConvBnReLU1d module is a module fused from Conv1d, BatchNorm1d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvBn2d | A ConvBn2d module is a module fused from Conv2d and BatchNorm2d, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvBnReLU2d | A ConvBnReLU2d module is a module fused from Conv2d, BatchNorm2d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvReLU2d | A ConvReLU2d module is a fused module of Conv2d and ReLU, attached with FakeQuantize modules for weight for quantization aware training. |
ConvBn3d | A ConvBn3d module is a module fused from Conv3d and BatchNorm3d, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvBnReLU3d | A ConvBnReLU3d module is a module fused from Conv3d, BatchNorm3d and ReLU, attached with FakeQuantize modules for weight, used in quantization aware training. |
ConvReLU3d | A ConvReLU3d module is a fused module of Conv3d and ReLU, attached with FakeQuantize modules for weight for quantization aware training. |
update_bn_stats | |
freeze_bn_stats |
torch.ao.nn.intrinsic.quantized#
This module implements the quantized implementations of fused operationslike conv + relu. No BatchNorm variants as it’s usually folded into convolutionfor inference.
BNReLU2d | A BNReLU2d module is a fused module of BatchNorm2d and ReLU |
BNReLU3d | A BNReLU3d module is a fused module of BatchNorm3d and ReLU |
ConvReLU1d | A ConvReLU1d module is a fused module of Conv1d and ReLU |
ConvReLU2d | A ConvReLU2d module is a fused module of Conv2d and ReLU |
ConvReLU3d | A ConvReLU3d module is a fused module of Conv3d and ReLU |
LinearReLU | A LinearReLU module fused from Linear and ReLU modules |
torch.ao.nn.intrinsic.quantized.dynamic#
This module implements the quantized dynamic implementations of fused operationslike linear + relu.
LinearReLU | A LinearReLU module fused from Linear and ReLU modules that can be used for dynamic quantization. |
torch.ao.nn.qat#
This module implements versions of the key nn modulesConv2d() andLinear() which run in FP32 but with rounding applied to simulate theeffect of INT8 quantization.
Conv2d | A Conv2d module attached with FakeQuantize modules for weight, used for quantization aware training. |
Conv3d | A Conv3d module attached with FakeQuantize modules for weight, used for quantization aware training. |
Linear | A linear module attached with FakeQuantize modules for weight, used for quantization aware training. |
torch.ao.nn.qat.dynamic#
This module implements versions of the key nn modules such asLinear()which run in FP32 but with rounding applied to simulate the effect of INT8quantization and will be dynamically quantized during inference.
Linear | A linear module attached with FakeQuantize modules for weight, used for dynamic quantization aware training. |
torch.ao.nn.quantized#
This module implements the quantized versions of the nn layers such as~torch.nn.Conv2d andtorch.nn.ReLU.
ReLU6 | Applies the element-wise function: |
Hardswish | This is the quantized version of |
ELU | This is the quantized equivalent of |
LeakyReLU | This is the quantized equivalent of |
Sigmoid | This is the quantized equivalent of |
BatchNorm2d | This is the quantized version of |
BatchNorm3d | This is the quantized version of |
Conv1d | Applies a 1D convolution over a quantized input signal composed of several quantized input planes. |
Conv2d | Applies a 2D convolution over a quantized input signal composed of several quantized input planes. |
Conv3d | Applies a 3D convolution over a quantized input signal composed of several quantized input planes. |
ConvTranspose1d | Applies a 1D transposed convolution operator over an input image composed of several input planes. |
ConvTranspose2d | Applies a 2D transposed convolution operator over an input image composed of several input planes. |
ConvTranspose3d | Applies a 3D transposed convolution operator over an input image composed of several input planes. |
Embedding | A quantized Embedding module with quantized packed weights as inputs. |
EmbeddingBag | A quantized EmbeddingBag module with quantized packed weights as inputs. |
FloatFunctional | State collector class for float operations. |
FXFloatFunctional | module to replace FloatFunctional module before FX graph mode quantization, since activation_post_process will be inserted in top level module directly |
QFunctional | Wrapper class for quantized operations. |
Linear | A quantized linear module with quantized tensor as inputs and outputs. |
LayerNorm | This is the quantized version of |
GroupNorm | This is the quantized version of |
InstanceNorm1d | This is the quantized version of |
InstanceNorm2d | This is the quantized version of |
InstanceNorm3d | This is the quantized version of |
torch.ao.nn.quantized.functional#
Functional interface (quantized).
This module implements the quantized versions of the functional layers such as~torch.nn.functional.conv2d andtorch.nn.functional.relu. Note: supports quantized inputs.
avg_pool2d | Applies 2D average-pooling operation in regions by step size steps. |
avg_pool3d | Applies 3D average-pooling operation in regions by step size steps. |
adaptive_avg_pool2d | Applies a 2D adaptive average pooling over a quantized input signal composed of several quantized input planes. |
adaptive_avg_pool3d | Applies a 3D adaptive average pooling over a quantized input signal composed of several quantized input planes. |
conv1d | Applies a 1D convolution over a quantized 1D input composed of several input planes. |
conv2d | Applies a 2D convolution over a quantized 2D input composed of several input planes. |
conv3d | Applies a 3D convolution over a quantized 3D input composed of several input planes. |
interpolate | Down/up samples the input to either the given |
linear | Applies a linear transformation to the incoming quantized data:. |
max_pool1d | Applies a 1D max pooling over a quantized input signal composed of several quantized input planes. |
max_pool2d | Applies a 2D max pooling over a quantized input signal composed of several quantized input planes. |
celu | Applies the quantized CELU function element-wise. |
leaky_relu | Quantized version of the. |
hardtanh | This is the quantized version of |
hardswish | This is the quantized version of |
threshold | Applies the quantized version of the threshold function element-wise: |
elu | This is the quantized version of |
hardsigmoid | This is the quantized version of |
clamp | float(input, min_, max_) -> Tensor |
upsample | Upsamples the input to either the given |
upsample_bilinear | Upsamples the input, using bilinear upsampling. |
upsample_nearest | Upsamples the input, using nearest neighbours' pixel values. |
torch.ao.nn.quantizable#
This module implements the quantizable versions of some of the nn layers.These modules can be used in conjunction with the custom module mechanism,by providing thecustom_module_config argument to both prepare and convert.
LSTM | A quantizable long short-term memory (LSTM). |
MultiheadAttention |
torch.ao.nn.quantized.dynamic#
Dynamically quantizedLinear,LSTM,LSTMCell,GRUCell, andRNNCell.
Linear | A dynamic quantized linear module with floating point tensor as inputs and outputs. |
LSTM | A dynamic quantized LSTM module with floating point tensor as inputs and outputs. |
GRU | Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. |
RNNCell | An Elman RNN cell with tanh or ReLU non-linearity. |
LSTMCell | A long short-term memory (LSTM) cell. |
GRUCell | A gated recurrent unit (GRU) cell |
Quantized dtypes and quantization schemes#
Note that operator implementations currently onlysupport per channel quantization for weights of theconv andlinearoperators. Furthermore, the input data ismapped linearly to the quantized data and vice versaas follows:
where is the same asclamp() while thescale and zero point are then computedas described inMinMaxObserver, specifically:
where :math:[x_\text{min},x_\text{max}] denotes the range of the input data while:math:Q_\text{min} and :math:Q_\text{max} are respectively the minimum and maximum values of the quantized dtype.
Note that the choice of :math:s and :math:z implies that zero is represented with no quantization error whenever zero is withinthe range of the input data or symmetric quantization is being used.
Additional data types and quantization schemes can be implemented throughthecustomoperatormechanism<https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html>_.
torch.qscheme— Type to describe the quantization scheme of a tensor.Supported types:torch.per_tensor_affine— per tensor, asymmetrictorch.per_channel_affine— per channel, asymmetrictorch.per_tensor_symmetric— per tensor, symmetrictorch.per_channel_symmetric— per channel, symmetric
torch.dtype— Type to describe the data. Supported types:torch.quint8— 8-bit unsigned integertorch.qint8— 8-bit signed integertorch.qint32— 32-bit signed integer
QAT Modules.
This package is in the process of being deprecated.Please, usetorch.ao.nn.qat.modules instead.
QAT Dynamic Modules.
This package is in the process of being deprecated.Please, usetorch.ao.nn.qat.dynamic instead.
This file is in the process of migration totorch/ao/quantization, andis kept here for compatibility while the migration process is ongoing.If you are adding a new entry/functionality, please, add it to theappropriate files undertorch/ao/quantization/fx/, while adding an import statementhere.
QAT Dynamic Modules.
This package is in the process of being deprecated.Please, usetorch.ao.nn.qat.dynamic instead.
Quantized Modules.
- Note::
Thetorch.nn.quantized namespace is in the process of being deprecated.Please, usetorch.ao.nn.quantized instead.
Quantized Dynamic Modules.
This file is in the process of migration totorch/ao/nn/quantized/dynamic,and is kept here for compatibility while the migration process is ongoing.If you are adding a new entry/functionality, please, add it to theappropriate file under thetorch/ao/nn/quantized/dynamic,while adding an import statement here.