Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Propose to refactor output normalization in several transformers#11850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
tolgacangoz wants to merge12 commits intohuggingface:main
base:main
Choose a base branch
Loading
fromtolgacangoz:transfer-shift_scale_norm-to-AdaLayerNorm

Conversation

tolgacangoz
Copy link
Contributor

@tolgacangoztolgacangoz commentedJul 2, 2025
edited
Loading

(I attempted to make replacements if you don't mind :)

This proposed PR will be activated when the SkyReels-V2 integration PR is merged intomain.

ReplaceFP32LayerNorm withAdaLayerNorm in theWanTransformer3DModel,WanVACETransformer3DModel, ..., to simplify the forward pass and enhance model parallelism compatibility.

Context:#11518 (comment)

Replace the final `FP32LayerNorm` and manual shift/scale application with a single `AdaLayerNorm` module in both the `WanTransformer3DModel` and `WanVACETransformer3DModel`.This change simplifies the forward pass by encapsulating the adaptive normalization logic within the `AdaLayerNorm` layer, removing the need for a separate `scale_shift_table`. The `_no_split_modules` list is also updated to include `norm_out` for compatibility with model parallelism.
Updates the key mapping for the `head.modulation` layer to `norm_out.linear` in the model conversion script.This correction ensures that weights are loaded correctly for both standard and VACE transformer models.
Replaces the manual implementation of adaptive layer normalization, which used a separate `scale_shift_table` and `nn.LayerNorm`, with the unified `AdaLayerNorm` module.This change simplifies the forward pass logic in several transformer models by encapsulating the normalization and modulation steps into a single component. It also adds `norm_out` to `_no_split_modules` for model parallelism compatibility.
Corrects the target key for `head.modulation` to `norm_out.linear.weight`.This ensures the weights are correctly mapped to the weight parameter of the output normalization layer during model conversion for both transformer types.
Adds a default zero-initialized bias tensor for the transformer's output normalization layer if it is missing from the original state dictionary.
@tolgacangoztolgacangoz changed the titleRefactor output normalization in several transformersPropose to refactor output normalization in several transformersJul 3, 2025
@tolgacangoztolgacangozforce-pushed thetransfer-shift_scale_norm-to-AdaLayerNorm branch fromdad0e68 to65639d5CompareJuly 18, 2025 07:09
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

1 participant
@tolgacangoz

[8]ページ先頭

©2009-2025 Movatter.jp