Log the average per-token entropy of the generations/completions (normalized over the local batch, BNPO-style). Since entropy is not computed by default, you need to setlog_entropy in the config. However, given the importance of this metric, I'm open to the idea of always computing entropy–provided that it doesn't introduce noticeable overhead.

Fixes#3571

This PR also includes a minor refactoring of the _compute_loss function.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read thecontributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

LeonEricssonand others added4 commits

July 7, 2025 10:06

init

f79ca17

feat(grpo): add entropy logging

2653582

Merge pull request#1from LeonEricsson/codex/implement-entropy-loggi…

dcf889d

…ng-in-grpo-trainingAdd entropy logging to GRPOTrainer

log entropy

7936888

LeonEricsson marked this pull request as ready for review

July 7, 2025 08:56

LeonEricsson marked this pull request as draft

July 7, 2025 08:57

LeonEricsson marked this pull request as ready for review

July 7, 2025 09:34

Copy link

CollaboratorAuthor

LeonEricsson commentedJul 7, 2025

Test run comparing the compute cost of calculating entropy. Currently, I'm only using a batch size of 12 and a maximum completion length of 400.

Merge branch 'main' into log_entropy

5c73091

Copy link

HuggingFaceDocBuilderDev commentedJul 8, 2025

The docs for this PR livehere. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LeonEricsson added2 commits

July 10, 2025 19:19

merge

5a80745

Merge branch 'log_entropy' of github.com:LeonEricsson/trl into log_en…

461c87c

…tropy

LeonEricsson marked this pull request as draft

July 10, 2025 17:20

LeonEricsson marked this pull request as ready for review

July 10, 2025 17:31

LeonEricsson added2 commits

July 10, 2025 19:32

refactored part of loss computation

a1cf531

docstring

33f68ca

pramodith reviewed

Jul 12, 2025

View reviewed changes

trl/trainer/grpo_trainer.py OutdatedShow resolvedHide resolved

detach gradients when logging + only compute mask when entropy masking

9d6f95a

pramodith reviewed

Jul 14, 2025

View reviewed changes

trl/trainer/grpo_trainer.py

		entropy_mask = None

		if self.log_entropy and entropies is not None:
		with torch.no_grad():

Copy link

Contributor

pramodithJul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think it'd be more efficient if thetorch.no_grad() context is here

trl/trl/trainer/grpo_trainer.py

Lines 905 to 907 in640a9f3

	ifcompute_entropy:
	entropies=entropy_from_logits(logits)
	all_entropies.append(entropies)

. It should theoretically be faster and have fewer memory ops.

Copy link

CollaboratorAuthor

LeonEricssonJul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I was considering this but didn’t want to complicate things with_get_per_token_logps_and_entropies. I’ll take another look and see if I can find a clean way to incorporate it. Either way, I’m leaning toward always logging entropy instead of making it configurable, which will probably change things.

LeonEricsson marked this pull request as draft

July 15, 2025 09:43

Labels

None yet

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GRPO] Log generation entropy#3700

Are you sure you want to change the base?

[GRPO] Log generation entropy#3700

Conversation

LeonEricsson commentedJul 7, 2025•
edited
Loading

Uh oh!

What does this PR do?

Before submitting

Who can review?