- Notifications
You must be signed in to change notification settings - Fork2k
GRPO: Pack Responses within the same group.#3642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
pramodith commentedJul 2, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@kashif and/or@qgallouedec wondering if either of you could take a look at this PR since you're familiar with how FA2 works with custom position ids. In particular I'm kind of lost on why Just for some context I'm trying to pack all the responses for a given prompt/query into a single row in the batch, before running a forward pass through the reference, old and current policy models. The thing I noticed is that despite sending in the right position ids FA2 doesn't seem to give me the same outputs with and without packing. For example, I tried this out trying to simulate how a packed GRPO group would look like. Tokens
I think that the logit scores for all the token ids |
Uh oh!
There was an error while loading.Please reload this page.
Pack Responses within the same group.
This PR addresses#3549 and packs all the responses belonging to the same group into a single row of the batch.
Before submitting
Pull Request section?
to it if that's the case.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.