preference-learning

Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.

decoding self-improvement knowledge-distillation data-augmentation reasoning self-consistency preference-learning hallucination self-correction attention-head large-language-models chain-of-thought large-language-model internal-consistency self-feedback self-refine self-correct

UpdatedDec 7, 2024
Jupyter Notebook

qxcv /magical

Star77

The MAGICAL benchmark suite for robust imitation learning (NeurIPS 2020)

reinforcement-learning imitation-learning preference-learning reinforcement-learning-environments

UpdatedDec 5, 2023
Python

This repository contains the source code for our paper: "NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning". For more details, please refer to our project website athttps://sites.google.com/view/san-navistar.

machine-learning reinforcement-learning transformer preference-learning robot-navigation socially-aware-navigation

UpdatedMar 8, 2025
Python

JanoschMenke /metis

Star48

Python-based GUI to collect Feedback of Chemist in Molecules

machine-learning drug-discovery human-in-the-loop preference-learning de-novo-drug-design generative-ai

UpdatedOct 15, 2024
Python

sail-sg /dice

Star43

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

alignment preference-learning large-language-models rlhf

UpdatedJul 29, 2024
Python

gao-g /prelude

Star37

Code for the paper "Aligning LLM Agents by Learning Latent Preference from User Edits".

transformers alignment user-feedback edits interpretability preference-learning gpt4 llm llms human-feedback

UpdatedNov 23, 2024
Python

CJReinforce /RIME_ICML2024

Star28

Official code for ICML 2024 paper, "RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences" (ICML 2024 Spotlight)

reinforcement-learning deep-learning robotics artificial-intelligence manipulation locomotion preference-learning reinforcement-learning-from-human-feedback

UpdatedOct 15, 2024
Python

liushunyu /awesome-direct-preference-optimization

Star28

A Survey of Direct Preference Optimization (DPO)

review survey alignment preference-learning dpo large-language-models llm llms large-language-model reinforcement-learning-from-human-feedback direct-preference-optimization

UpdatedMar 18, 2025

typoverflow /WiseRL

Star19

PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms

reinforcement-learning pytorch preference-learning

UpdatedMar 24, 2025
Python

vicgalle /configurable-safety-tuning

Sponsor

Star16

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

alignment safety preference-learning dpo llm

UpdatedJul 27, 2024
Python

julilien /PLDepth

Star14

Code for "Monocular Depth Estimation via Listwise Ranking using the Plackett-Luce Model" as published at CVPR 2021.

machine-learning deep-learning learning-to-rank cvpr weakly-supervised-learning preference-learning monocular-depth monocular-depth-estimation plackett-luce cvpr2021 relative-depth

UpdatedFeb 3, 2024
Python

SMARTlab-Purdue /SAN-FAPL

Star8

This repository contains the source code for our paper: "Feedback-efficient Active Preference Learning for Socially Aware Robot Navigation", accepted to IROS-2022. For more details, please refer to our project website athttps://sites.google.com/view/san-fapl.

machine-learning reinforcement-learning learning-from-demonstration preference-learning robot-navigation socially-aware-navigation