Movatterモバイル変換

Skip to content

#

reinforcement-learning-human-feedback

Here are 2 public repositories matching this topic...

mihirp1998 /VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

reinforcement-learning alignment rl diffusion vader rlhf video-diffusion video-diffusion-alignment reinforcement-learning-human-feedback

UpdatedMar 12, 2025
Python

yflyzhang /RankPO

RankPO: Rank Preference Optimization

information-retrieval dpo large-language-models llm rlhf rlaif reinforcement-learning-human-feedback direct-preference-optimization

UpdatedMar 17, 2025
Python

Improve this page

Add a description, image, and links to thereinforcement-learning-human-feedback topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thereinforcement-learning-human-feedback topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp