Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

dpo

Here are 77 public repositories matching this topic...

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。

  • UpdatedMar 8, 2025
  • Python

Align Anything: Training All-modality Model with Feedback

  • UpdatedMar 18, 2025
  • Python

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

  • UpdatedMar 8, 2025
  • Python

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

  • UpdatedJan 24, 2025
  • Python

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

  • UpdatedSep 6, 2024
  • Jupyter Notebook

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

  • UpdatedJan 19, 2025
  • Python

An Efficient "Factory" to Build Multiple LoRA Adapters

  • UpdatedFeb 13, 2025
  • Python

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

  • UpdatedMar 17, 2025
  • Python

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

  • UpdatedMar 10, 2025
  • Python

Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization

  • UpdatedDec 17, 2024
  • Python

[ICLR 2025] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

  • UpdatedFeb 19, 2025
  • Python

A RLHF Infrastructure for Vision-Language Models

  • UpdatedNov 15, 2024
  • Python

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

  • UpdatedJan 15, 2024
  • Python

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

  • UpdatedOct 16, 2024
  • Python

CodeUltraFeedback: aligning large language models to coding preferences

  • UpdatedJun 25, 2024
  • Python

[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction

  • UpdatedFeb 28, 2025
  • Python

[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

  • UpdatedOct 23, 2024
  • Python

[ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

  • UpdatedJul 28, 2024
  • Python

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

  • UpdatedFeb 19, 2025
  • Python

Improve this page

Add a description, image, and links to thedpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thedpo topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp