Runze Liu RyanLiu112

🎯

Focusing

Incoming Ph.D. @ HKU & Master's student @ THU

Achievements

compute-optimal-ttscompute-optimal-ttsPublic
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
Python 267 21
GenPRMGenPRMPublic
Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".
Python 79 2
Awesome-Process-Reward-ModelsAwesome-Process-Reward-ModelsPublic
A comprehensive collection of process reward models.
95 1
wizard-III/ArcherCodeRwizard-III/ArcherCodeRPublic
ArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning.
Python 5
MRNMRNPublic
[NeurIPS 2022] Official codebase for "Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning".
Python 23 5
ChangWinde/RATChangWinde/RATPublic
[AAAI 2025 Oral] Official code for "RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors"
Python 14