Movatterモバイル変換

		gt=0.0,
		le=1.0,
		title="Percentage damping factor.",
		description="The percentage of average Hessian diagonal used for damping.",

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

if you have a reference from the original paper about what these are, could you also share the link too?

cjluo-nv reviewed

		batch_size=input.shape[0]

		# Incremental averaging: scale down old hessian
		hessian*=n_samples/ (n_samples+batch_size)

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

what's the dtype of hessian? Do we need to up cast to fp32 for this division?

cjluo-nv reviewed

		hessian,n_samples=update_hessian(input[0],state["hessian"],state["n_samples"])
		hessian_state[module.name]= {"hessian":hessian,"n_samples":n_samples}
		torch.cuda.empty_cache()
		gc.collect()

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do we have to do gc.collect() here? It's going to be very slow

cjluo-nv reviewed


		# Phase 1: Collect statistics for quantizers
		enable_stats_collection(model)
		max_calibrate(model,forward_loop)

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do you need forward_loop here? Is this for weight amax calib only?

cjluo-nv reviewed

		state=hessian_state[module.name]
		hessian=state["hessian"].to(module.weight.device)
		blockwise_weight_update(module,hessian,block_size,percdamp)
		torch.cuda.empty_cache()

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

maybe you can del the hessian after applying blockwise_weight_update?

cjluo-nv reviewed

		hessian_state_path:str\|None=ModeloptField(
		default=None,
		title="Path to the Hessian state file.",
		description="The path to the Hessian state file.",

Copy link

Collaborator

cjluo-nvNov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe state: if the path exists, we load the hessian from the path instead of re-computing them.

meenchen reviewed

Nov 24, 2025

Comment on lines +1119 to +1120

		GPTQ lite does not perform sequential quantization of layers. This means that the updated
		activations are not used to process the next layer.

Copy link

Contributor

meenchenNov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can you estimate how much effort is needed if we need to add this constraint? I am thinking if we can have a quick test to see what's the accuracy impact.

Comment on lines +1135 to +1139

		block_size:int\|None=ModeloptField(
		default=128,
		title="Block size for GPTQ weight update.",
		description="The block size for GPTQ weight update.",
		)

Copy link

Contributor

meenchenNov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This should be the multiple of block_size used in quantization. We should explain it in the description as well.

		gt=0.0,
		le=1.0,
		title="Percentage damping factor.",
		description="The percentage of average Hessian diagonal used for damping.",

Copy link

Contributor

meenchenNov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Could you also add some instructions here, so users can know what's the impact of increasing/decreasing this parameter?

		tensor_mapping= {}
		forname,moduleinmodel.named_modules():
		ifis_quantized_linear(module)andmodule.weight_quantizer.is_enabled:
		in_features=module.weight.shape[1]

Copy link

Contributor

meenchenNov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can we usemodule.weight.shape[-1] instead incase of 3D weight?