Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Pull requests: huggingface/trl

Author
Filter by author
Loading
Label
Filter by label
Loading
Usealt +click/return to exclude labels
or +click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobodyLoading
Sort

Pull requests list

usessteps_per_generation in vllm max_num_seqs
#3747 openedJul 19, 2025 byakakakakakaaLoading…
5 tasks
Add comment foraverage_tokens_across_devices
#3746 openedJul 18, 2025 byqgallouedecLoading…
fix loss normalization with entropy mask
#3744 openedJul 18, 2025 byhjh0119Loading…
5 tasks
Support dLLM in GRPO reference model creation
#3743 openedJul 18, 2025 byxijia-taoLoading…
[GRPO] Fix: Processing ref logprobs in batches
#3740 openedJul 16, 2025 byidanshenLoading…
Add basic support for FSDP/Lora when using TRL/VLLM
#3735 openedJul 14, 2025 byojh31Loading…
5 tasks
🏗️ Refactor top-entropy in GRPO
#3727 openedJul 12, 2025 byqgallouedecLoading…
Remove the negative value of KL divergence
#3710 openedJul 9, 2025 byENg-122Loading…
⚰️ Remove deprecated
#3704 openedJul 8, 2025 byqgallouedecLoading…
5 tasks
[GRPO] Log generation entropy
#3700 openedJul 7, 2025 byLeonEricsson Draft
2 of 5 tasks
FSDP2+GRPO
#3687 openedJul 3, 2025 bySalmanMohammadiLoading…
5 tasks
Support FSDP2 in GRPOTrainer
#3670 openedJun 30, 2025 bythepowerfuldeezLoading…
[SFT] Dry up the sft tests
#3657 openedJun 27, 2025 bykashifLoading…
5 tasks
feat: Initial implementation of RePO trainer and components
#3655 openedJun 26, 2025 bycelsowmLoading…
5 tasks
Ensure Chat Template Safe Prompt Truncation
#3646 openedJun 25, 2025 bypramodithLoading…
4 of 5 tasks
[WIP] vllm-server-spec-dec-support
#3643 openedJun 24, 2025 byshirinyamaniLoading…
5 tasks
GRPO: Pack Responses within the same group.
#3642 openedJun 24, 2025 bypramodith Draft
4 of 5 tasks
Add Entropy Control to GRPOTrainer
#3628 openedJun 22, 2025 by1485840691Loading…
[WIP] [SFT] SFT doc rewrite
#3619 openedJun 18, 2025 byqgallouedecLoading…
5 tasks
ClearML logging of visualization in RewardTrainer evaluation
#3602 openedJun 16, 2025 byioverhoLoading…
2 of 5 tasks
Fix: corrected fsdp in GRPO trainer
#3582 openedJun 13, 2025 bytryumanshowLoading…
2 of 5 tasks
Previous134
Previous
ProTip! Filter pull requests by the default branch withbase:main.

[8]ページ先頭

©2009-2025 Movatter.jp