dpo

Star

Here are 77 public repositories matching this topic...

Language:All

Filter by language

All77 Python40 Jupyter Notebook16 PHP9 C#1 Go1 Haskell1 JavaScript1 Shell1 TypeScript1 VHDL1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

shibing624 /MedicalGPT

Star3.7k

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

medical llama gpt dpo llm chatgpt medicalgpt

UpdatedMar 8, 2025
Python

PKU-Alignment /align-anything

Star2.9k

Align Anything: Training All-modality Model with Feedback

chameleon multimodal dpo large-language-models rlhf vision-language-model

UpdatedMar 18, 2025
Python

ContextualAI /HALOs

Star816

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

UpdatedMar 8, 2025
Python

jianzhnie /LLamaTuner

Star596

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

UpdatedJan 24, 2025
Python

ukairia777 /tensorflow-nlp-tutorial

Star541

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

nlp natural-language-processing tensorflow transformers named-entity-recognition question-answering llama lora trainer bert keras-tutorial sft dpo nlp-tutorial huggingface bert-ner llm

UpdatedSep 6, 2024
Jupyter Notebook

dvlab-research /Step-DPO

Star355

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

UpdatedJan 19, 2025
Python

TUDB-Labs /mLoRA

Star303

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

UpdatedFeb 13, 2025
Python

armbues /SiLLM

Star258

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

UpdatedMar 17, 2025
Python

sail-sg /oat

Star222

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

thompson-sampling alignment reasoning distributed-training ppo dueling-bandits dpo distributed-rl llm online-rl rlhf llm-aligment online-alignment llm-exploration grpo r1-zero

UpdatedMar 10, 2025
Python

RockeyCoss /SPO

Star185

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

text-to-image dpo diffusion-models text-to-image-generation sdxl

UpdatedDec 17, 2024
Python

YangLing0818 /IterComp

Star172

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

text-to-image dpo rlhf reward-modeling

UpdatedFeb 19, 2025
Python

TideDra /VL-RLHF

Star167

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

UpdatedNov 15, 2024
Python

argilla-io /notus

Star165

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

UpdatedJan 15, 2024
Python

anilca /NetTrader.Indicator

Star141

Technical anaysis library for .NET

cmf momentum atr roc envelope adx cci sar cmo ema rsi adl obv pvt trix macd bollinger-bands ichimoku-cloud dpo dema

UpdatedSep 8, 2024
C#

NiuTrans /Vision-LLM-Alignment

Star102

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

vision alignment multi-model reward ppo sft dpo llm rlhf mllm llava llama3-vision

UpdatedOct 16, 2024
Python

martin-wey /CodeUltraFeedback

Star70

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

UpdatedJun 25, 2024
Python

YangLing0818 /SuperCorrect-llm

Star61

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

reflection self-correction dpo llm llm-reasoning

UpdatedFeb 28, 2025
Python

junkangwu /beta-DPO

Star41

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

alignment dpo rlhf preference-alignment

UpdatedOct 23, 2024
Python

TianduoWang /DPO-ST

Star41

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

dpo math-word-problem chain-of-thought

UpdatedJul 28, 2024
Python

taco-group /Re-Align

Star34

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

alignment safety vlm post-training ppo hallucination dpo large-language-models llm rlhf mllm vision-language-model multimodal-large-language-models hallucination-mitigation

UpdatedFeb 19, 2025
Python

Improve this page

Add a description, image, and links to thedpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedpo topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo