PerToken#
- classtorch.ao.quantization.observer.PerToken[source]#
Represents per-token granularity in quantization.
This granularity type calculates a different set of quantization parametersfor each token, which is represented as the last dimension of the tensor.
For example, if the input tensor has shape [2, 3, 4], then there are 6 tokenswith 4 elements each, and we will calculate 6 sets of quantization parameters,one for each token.
If the input tensor has only two dimensions, e.g. [8, 16], then this isequivalent toPerAxis(axis=0), which yields 8 sets of quantization parameters.