NotificationsYou must be signed in to change notification settings
Fork386
Star2.6k

enable smoothquant for int8 static tensor#3468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

jcaip wants to merge37 commits intomain

base:main

Choose a base branch

fromjcaip/enable-smoothquant

Open

enable smoothquant for int8 static tensor#3468

jcaip wants to merge37 commits intomainfromjcaip/enable-smoothquant

+57 −12

Conversation

Copy link

Contributor

jcaip commentedDec 8, 2025•
edited
Loading

This PR hooks up the static quant workflow added in#3442 to the prototype smoothquant API.

You can use the new flow like follows:

fromtorchao.quantization.quant_apiimport (Int8StaticActivationInt8WeightConfig,)fromtorchao.prototype.smoothquantimport (SmoothQuantConfig)config=SmoothQuantConfig(base_config=Int8StaticActivationInt8Weight(granularity=PerRow()),step=SmoothQuantStep.PREPARE,alpha=0.5,        )quantize_(model,config)# Perform calibration with test datamodel(*x)config.step=SmoothQuantStep.CONVERTquantize_(model,config)# model will now be statically quantized with the inputs used in smoothquant observer.model(*x)

jcaip added30 commits

December 1, 2025 12:55

Int8Tensor migration

48cdb61

Summary:This PR creates a new Int8Tensor and updates the configs to use the newInt8Tensor flowTest Plan:To ensure BC:```pytest test/quantization/test_quant_api.py```To test new Int8Tensor:```pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py```Reviewers:Subscribers:Tasks:Tags:

ruff fixes

0b73aed

add init

1e49945

fix ruff again

669b6ee

update

9071526

wip

1539e0f

Merge branch 'main' into jcaip/int8-tensor

d9a2b1b

undo update tests

673f228

fix ruff

739fd64

fix varname

750db1a

fix typing

9410488

add tests

45a3a76

fix dtype

4e2f09c

fix ci

dd80cca

address granularity cr

7f73062

update _choose_quant_func_and_quantize_tensor

ac6a2b6

make block size required attribute

f28df4a

made dtype required as well

328585e

address nits

ce4d568

skip per tensor weight only test for now

a665d45

add static quant

0338016

add static quant

ee39691

update

9eb0aa9

static quant working eager + compile

d4a1514

remove file

3cdea56

added asserts

fa9022d

undo smoothquant change

8ce5cde

fix return

6f64121

Merge branch 'main' into jcaip/static-quant-rebased

8ae921d

got smoothquant + int8 static working

5b9e243

generalized smoothquat code

7a0e38f

Copy link

pytorch-botbot commentedDec 8, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/pytorch/ao/3468

📄 PreviewPython docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Pending

As of commit0c23589 with merge basef99105a ():

NEW FAILURE - The following job has failed:

Code Analysis with Ruff / build (3.9) (gh)
torchao/quantization/quantize_/workflows/int8/int8_tensor.py:6:8: F401 [*]math imported but unused

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-clabot added the CLA SignedThis label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label

Dec 8, 2025

jcaip added3 commits

December 8, 2025 13:26

free tests

3d18edf

fix static scale check

9e07f8b

update

4274e02

jcaip added the topic: improvementUse this tag if this PR is an improvement (doesn't fit into any of the other categories) label

Dec 8, 2025

jcaip changed the title~~[wip] enable smoothquant for int8 static tensor~~enable smoothquant for int8 static tensor

Dec 8, 2025

jcaip marked this pull request as ready for review

December 8, 2025 22:24

jcaip mentioned this pull request

Dec 8, 2025

Add int8 static quantization workflow#3442

Merged

jcaip requested a review fromjerryzh168

December 8, 2025 22:29

Copy link

ContributorAuthor

jcaip commentedDec 8, 2025

cc@Xia-Weiwen and@cyxlily fyi

jerryzh168 reviewed

Dec 9, 2025

View reviewed changes

torchao/prototype/smoothquant/api.py

		qw=quant_mod.weight

		# Add smoothing factor metadata
		qw=to_weight_tensor_with_linear_activation_scale_metadata(

Copy link

Contributor

jerryzh168Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

we should not be using this, please check awq on how this should be implemented in the new stack:

ao/torchao/prototype/awq/api.py

Lines 108 to 113 in08e5e20

	assertisinstance(qw,SupportsActivationPreScaling), (
	"weight must support activation scaling through implementing `SupportsActivationPreScaling`"
	)
	# since we want to do `act` * `act_pre_scale` during runtime for speed, we'll save the
	# reciprocal of the `equalization_scale`
	qw.act_pre_scale=1.0/equalization_scale

jcaipand others added2 commits

December 8, 2025 18:21

address cr feedback

b5309eb

Merge branch 'jcaip/static-quant-rebased' into jcaip/enable-smoothquant

a732fee

jerryzh168 reviewed

Dec 9, 2025

View reviewed changes

torchao/quantization/quant_api.py

		"""

		scale:torch.Tensor
		scale:torch.Tensor=None

Copy link

Contributor

jerryzh168Dec 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: Optional[torch.Tensor]

jerryzh168 reviewed

Dec 9, 2025

View reviewed changes

test/prototype/test_smoothquant.py

		[
		Int8DynamicActivationInt8WeightConfig(),
		Int8DynamicActivationInt8WeightConfig(version=2),
		# TODO: not sure if we should allow not passing scales as part of static config?