Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[GRPO] Log generation entropy#3700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
LeonEricsson wants to merge10 commits intohuggingface:main
base:main
Choose a base branch
Loading
fromLeonEricsson:log_entropy

Conversation

LeonEricsson
Copy link
Collaborator

@LeonEricssonLeonEricsson commentedJul 7, 2025
edited
Loading

What does this PR do?

Log the average per-token entropy of the generations/completions (normalized over the local batch, BNPO-style). Since entropy is not computed by default, you need to setlog_entropy in the config. However, given the importance of this metric, I'm open to the idea of always computing entropy–provided that it doesn't introduce noticeable overhead.

Fixes#3571

This PR also includes a minor refactoring of the _compute_loss function.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read thecontributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@LeonEricssonLeonEricsson marked this pull request as ready for reviewJuly 7, 2025 08:56
@LeonEricssonLeonEricsson marked this pull request as draftJuly 7, 2025 08:57
@LeonEricssonLeonEricsson marked this pull request as ready for reviewJuly 7, 2025 09:34
@LeonEricsson
Copy link
CollaboratorAuthor

Test run comparing the compute cost of calculating entropy. Currently, I'm only using a batch size of 12 and a maximum completion length of 400.

W B Chart 07_07_2025, 13_25_13

@HuggingFaceDocBuilderDev

The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@LeonEricssonLeonEricsson marked this pull request as draftJuly 10, 2025 17:20
@LeonEricssonLeonEricsson marked this pull request as ready for reviewJuly 10, 2025 17:31
entropy_mask = None

if self.log_entropy and entropies is not None:
with torch.no_grad():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it'd be more efficient if thetorch.no_grad() context is here

ifcompute_entropy:
entropies=entropy_from_logits(logits)
all_entropies.append(entropies)
. It should theoretically be faster and have fewer memory ops.

Copy link
CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I was considering this but didn’t want to complicate things with_get_per_token_logps_and_entropies. I’ll take another look and see if I can find a clean way to incorporate it. Either way, I’m leaning toward always logging entropy instead of making it configurable, which will probably change things.

@LeonEricssonLeonEricsson marked this pull request as draftJuly 15, 2025 09:43
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@pramodithpramodithpramodith left review comments

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

[GRPO] Entropy metric
3 participants
@LeonEricsson@HuggingFaceDocBuilderDev@pramodith

[8]ページ先頭

©2009-2025 Movatter.jp