Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Context Parallel w/ Ring & Ulysses & Unified Attention#11941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
a-r-r-o-w wants to merge30 commits intomain
base:main
Choose a base branch
Loading
fromattn-dispatcher-cp-and-training

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-wa-r-r-o-w commentedJul 16, 2025
edited
Loading

Adds support for ring, ulysses and unified attention natively. For a minimal PoC, I've limited changes to Flux.

Supported attention backends with CP: cuDNN, FA2, Sage.

Requires#11916 to be merged first.

Minimal example

importtorchfromdiffusersimportFluxPipelinetry:torch.distributed.init_process_group("nccl")rank=torch.distributed.get_rank()device=torch.device("cuda",rank%torch.cuda.device_count())torch.cuda.set_device(device)pipe=FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",torch_dtype=torch.bfloat16)pipe.to(device)# pipe.transformer.parallelize(ring_degree=2)pipe.transformer.parallelize(ulysses_degree=2)pipe.transformer.set_attention_backend("_native_cudnn")prompt="A cat holding a sign that says 'hello world'"# Must specify generator so all ranks start with same latents (or pass your own)generator=torch.Generator().manual_seed(42)image=pipe(prompt,num_inference_steps=28,guidance_scale=4.0,generator=generator).images[0]ifrank==0:image.save("output.png")exceptExceptionase:print(f"An error occurred:{e}")torch.distributed.breakpoint()raisefinally:iftorch.distributed.is_initialized():torch.distributed.destroy_process_group()

Benchmarks

TODO

Explanation

Each model should define a_cp_plan attribute that contains information on how to shard/gather tensors at different stages of the forward.

TODO

Note: There were some merge conflicts that I'm not sure I resolved correctly. Some things may be broken. For this reason, I've removed training support and only tested inference. I'll update some of the TODOs tomorrow

@HuggingFaceDocBuilderDev

The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@DN6DN6Awaiting requested review from DN6

@yiyixuxuyiyixuxuAwaiting requested review from yiyixuxu

@sayakpaulsayakpaulAwaiting requested review from sayakpaul

@SunMarcSunMarcAwaiting requested review from SunMarc

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
roadmapAdd to current release roadmap
Projects
Status: In Progress
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@a-r-r-o-w@HuggingFaceDocBuilderDev

[8]ページ先頭

©2009-2025 Movatter.jp