This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read thecontributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

pramodith added6 commits

June 23, 2025 11:58

Rough implementation of packing for a group.

a18d967

Add test case.

eaff9d7

Fix test cases.

4333a17

Unpacking implementation.

209364f

Add test case for comparing loss with and without packing.

4521dac

Merge branch 'main' into pramodith/grpo_group_packing

56f6e17

pramodith changed the title~~Pramodith/grpo group packing~~GRPO: Pack Responses within the same group.

Jun 24, 2025

pramodithand others added15 commits

June 24, 2025 17:11

Use flex attention in test cases.

e18eac5

minor changes.

b929131

Packing made more torchish.

ef7d4fb

Merge branch 'main' into pramodith/grpo_group_packing

a72fcf6

Fix bug

7cc4652

Fix bugs.

1efbd3d

ordering investigation.

26d6e31

Fix some bugs.

b4e596e

Check if model uses flash-attn 2

d34eb2a

Merge branch 'main' into pramodith/grpo_group_packing

e68eeae

Take care of entropy calc with packing.

2fd49f9

Fix padding side.

ef79f6a

Fix packing padding to be left for prompt and right for completions.

e0dba53

New test case

0a445de

Mini batch

30d44de

Copy link

ContributorAuthor

pramodith commentedJul 2, 2025•
edited
Loading

@kashif and/or@qgallouedec wondering if either of you could take a look at this PR since you're familiar with how FA2 works with custom position ids. In particular I'm kind of lost on whytest_forward_pass_with_packing is failing. I've validated that the logic for packing the inputs and unpacking the logits is accurate through my test cases and debugging.

Just for some context I'm trying to pack all the responses for a given prompt/query into a single row in the batch, before running a forward pass through the reference, old and current policy models.

The thing I noticed is that despite sending in the right position ids FA2 doesn't seem to give me the same outputs with and without packing.

For example, I tried this out trying to simulate how a packed GRPO group would look like. Tokens[646,647,648] areprompt tokens here.

pad_token_id = trainer.processing_class.pad_token_id            sample_input_ids = torch.tensor(    [        [646, 647, 648, 649, pad_token_id],        [646, 647, 648, 650, 651],    ],    device=trainer.model.device,)sample_attention_mask = torch.tensor(    [        [1, 1, 1, 1, 0],        [1, 1, 1, 1, 1],    ],    device=trainer.model.device,)sample_packed_input_ids = torch.tensor(    [        [646, 647, 648, 649, 650, 651],    ],    device=trainer.model.device,)sample_position_ids = torch.tensor(    [        [0, 1, 2, 3, 3, 4],    ],    device=trainer.model.device,)reg_logs = trainer.model(input_ids=sample_input_ids, attention_mask=sample_attention_mask).logitsreg_logs = torch.gather(reg_logs, -1, sample_input_ids.unsqueeze(-1))packed_logs = trainer.model(    input_ids=sample_packed_input_ids,    position_ids=sample_position_ids,).logitspacked_logs = torch.gather(packed_logs, -1, sample_packed_input_ids.unsqueeze(-1))

I think that the logit scores for all the token ids[646,647,648,649,650, 651] should match forreg_logs andpacked_logs but I'm seeing that the logit scores for tokens650 and 651 do not match. So it seems like FA2 isn't correctly accounting for packing or I'm doing something wrong. Any pointers would be much appreciated. I think the attention_mask isn't correctly being generated for this use case inside FA2, so wondering if explicitly passing aattention_mask might help.

Labels

None yet

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO: Pack Responses within the same group.#3642

Are you sure you want to change the base?

GRPO: Pack Responses within the same group.#3642

Uh oh!

Conversation

pramodith commentedJun 24, 2025•
edited
Loading

Uh oh!

Pack Responses within the same group.

Before submitting

Who can review?

Uh oh!

pramodith commentedJul 2, 2025•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

GRPO: Pack Responses within the same group.#3642

Are you sure you want to change the base?

GRPO: Pack Responses within the same group.#3642

Uh oh!

Conversation

pramodith commentedJun 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Pack Responses within the same group.

Before submitting

Who can review?

Uh oh!

pramodith commentedJul 2, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

pramodith commentedJun 24, 2025•
edited
Loading

pramodith commentedJul 2, 2025•
edited
Loading