#
reinforcement-learning-human-feedback
Here are 2 public repositories matching this topic...
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
reinforcement-learningalignmentrldiffusionvaderrlhfvideo-diffusionvideo-diffusion-alignmentreinforcement-learning-human-feedback
- Updated
Mar 12, 2025 - Python
RankPO: Rank Preference Optimization
information-retrievaldpolarge-language-modelsllmrlhfrlaifreinforcement-learning-human-feedbackdirect-preference-optimization
- Updated
Mar 17, 2025 - Python
Improve this page
Add a description, image, and links to thereinforcement-learning-human-feedback topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thereinforcement-learning-human-feedback topic, visit your repo's landing page and select "manage topics."