ReplaceFP32LayerNorm withAdaLayerNorm in theWanTransformer3DModel,WanVACETransformer3DModel, ..., to simplify the forward pass and enhance model parallelism compatibility.

Context:#11518 (comment)

tolgacangoz added7 commits

July 2, 2025 18:55

Refactor output norm to use AdaLayerNorm in Wan transformers

89f39be

Replace the final `FP32LayerNorm` and manual shift/scale application with a single `AdaLayerNorm` module in both the `WanTransformer3DModel` and `WanVACETransformer3DModel`.This change simplifies the forward pass by encapsulating the adaptive normalization logic within the `AdaLayerNorm` layer, removing the need for a separate `scale_shift_table`. The `_no_split_modules` list is also updated to include `norm_out` for compatibility with model parallelism.

fix: remove scale_shift_table from _keep_in_fp32_modules in Wan and W…

e4b30b8

…anVACE transformers

Fixes transformed head modulation layer mapping

92f8237

Updates the key mapping for the `head.modulation` layer to `norm_out.linear` in the model conversion script.This correction ensures that weights are loaded correctly for both standard and VACE transformer models.

Fix: Revert removingscale_shift_table from_keep_in_fp32_modules…

df07b88

… in Wan and WanVACE transformers

Refactors transformer output blocks to use AdaLayerNorm

e555903

Replaces the manual implementation of adaptive layer normalization, which used a separate `scale_shift_table` and `nn.LayerNorm`, with the unified `AdaLayerNorm` module.This change simplifies the forward pass logic in several transformer models by encapsulating the normalization and modulation steps into a single component. It also adds `norm_out` to `_no_split_modules` for model parallelism compatibility.

Fixhead.modulation mapping in conversion script

921396a

Corrects the target key for `head.modulation` to `norm_out.linear.weight`.This ensures the weights are correctly mapped to the weight parameter of the output normalization layer during model conversion for both transformer types.

Fix handling of missing bias keys in conversion script

ff95d5d

Adds a default zero-initialized bias tensor for the transformer's output normalization layer if it is missing from the original state dictionary.

tolgacangoz changed the title~~Refactor output normalization in several transformers~~Propose to refactor output normalization in several transformers

Jul 3, 2025

tolgacangozand others added4 commits

July 3, 2025 19:53

Merge branch 'main' into transfer-shift_scale_norm-to-AdaLayerNorm

3fd6f4e

Merge branch 'main' into transfer-shift_scale_norm-to-AdaLayerNorm

42c1451

add backwardability

3178c4e

style

65639d5

tolgacangoz force-pushed thetransfer-shift_scale_norm-to-AdaLayerNorm branch fromdad0e68 to65639d5Compare

July 18, 2025 07:09

Merge branch 'main' into transfer-shift_scale_norm-to-AdaLayerNorm

6e6dfa8

Labels

None yet

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Propose to refactor output normalization in several transformers#11850

Are you sure you want to change the base?

Propose to refactor output normalization in several transformers#11850

Uh oh!

Conversation

tolgacangoz commentedJul 2, 2025•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

Propose to refactor output normalization in several transformers#11850

Are you sure you want to change the base?

Propose to refactor output normalization in several transformers#11850

Uh oh!

Conversation

tolgacangoz commentedJul 2, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tolgacangoz commentedJul 2, 2025•
edited
Loading