Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[WIP] R1-Zero-like experiments#569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
lewtun wants to merge51 commits intomain
base:main
Choose a base branch
Loading
fromr1-zero
Draft
Show file tree
Hide file tree
Changes from1 commit
Commits
Show all changes
51 commits
Select commitHold shift + click to select a range
b5e6f9c
Add R1 Zero 7B
lewtunMar 29, 2025
8a4af61
Fix chat template
lewtunMar 29, 2025
9e0e478
Add new difficulty levels
lewtunMar 29, 2025
b35213c
Add medium, hard, ultra hard recipes
lewtunMar 31, 2025
1d6c0bb
Fix accuracy rewards
lewtunMar 31, 2025
5747cfc
Return None for invalid samples
lewtunMar 31, 2025
1078b73
Fix order of inputs
lewtunApr 1, 2025
d9c8cd8
Use None for unferified
lewtunApr 1, 2025
8f26046
Merge branch 'main' into r1-zero
lewtunApr 1, 2025
5fe41f0
Pin trl
lewtunApr 1, 2025
f22657b
Set defaults
lewtunApr 1, 2025
82a1167
Log unique only
lewtunApr 1, 2025
2897519
Revert config
lewtunApr 1, 2025
d51de45
Use proper dataset
lewtunApr 2, 2025
f1832c5
Pin TRL
lewtunApr 3, 2025
995beb8
Clean up
lewtunApr 4, 2025
1d7d66a
Merge branch 'main' into r1-zero
lewtunApr 4, 2025
10a555b
Add soft format reward
lewtunApr 7, 2025
0f98a5a
Fix soft reward to be really soft
lewtunApr 7, 2025
23b7b69
Merge branch 'main' into r1-zero
lewtunApr 8, 2025
f62e42a
Pin TRL for overlong masking
lewtunApr 8, 2025
939c74c
Fix liger
lewtunApr 9, 2025
9bed487
Add v01
lewtunApr 9, 2025
b29e672
Add level configs and DAPO
lewtunApr 10, 2025
7a8dead
Fix
lewtunApr 11, 2025
2d74588
Merge branch 'main' into r1-zero
lewtunApr 11, 2025
c1d2352
Add q3
lewtunApr 11, 2025
8500f41
Parse GAS
lewtunApr 12, 2025
3c312f8
Add hack for lighteval
lewtunApr 14, 2025
b6a73c0
Merge branch 'main' into r1-zero
lewtunApr 16, 2025
a5f3baa
Merge branch 'main' into r1-zero
lewtunApr 17, 2025
f3920f8
Pin TRL
lewtunApr 17, 2025
06bdd50
Merge branch 'main' into r1-zero
lewtunApr 17, 2025
2f0b983
Add 32B recipe
lewtunApr 22, 2025
be72ce6
Fix sharding in Slurm
lewtunApr 23, 2025
0df1654
Tune recipe
lewtunApr 23, 2025
c24ffd7
Fix attempt on Slurm
lewtunApr 23, 2025
2715d31
Hack
lewtunApr 23, 2025
cebaad5
Wait
lewtunApr 23, 2025
2f4b0da
Revert slurm
lewtunApr 23, 2025
f27c732
Fix
lewtunApr 23, 2025
5f0b8f8
Remove hf-transfer in favour of hf-xet
lewtunApr 24, 2025
46c1656
Pin transformers
lewtunApr 26, 2025
2c0cac5
Merge branch 'main' into r1-zero
lewtunApr 26, 2025
8d993d5
add gen batch exp config
edbeechingMay 5, 2025
a82c1fd
adds weighted code reward
edbeechingMay 7, 2025
d9a6c08
add latest configs
edbeechingMay 7, 2025
464d951
Merge branch 'main' into r1-zero
lewtunMay 8, 2025
b430693
Merge branch 'main' into r1-zero
edbeechingMay 9, 2025
0ed9ea3
Merge branch 'main' into r1-zero
edbeechingMay 10, 2025
a401d64
Merge branch 'main' into r1-zero
lewtunMay 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
Revert slurm
  • Loading branch information
@lewtun
lewtun committedApr 23, 2025
commit2f4b0daba915b13d9278529a46a855134284d484
4 changes: 2 additions & 2 deletionsrecipes/OpenR1-Zero-32B-Math/grpo/config_v00.00.yaml
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -24,8 +24,8 @@ use_vllm: true
do_eval: false
gradient_accumulation_steps: 16
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
#gradient_checkpointing_kwargs:
# use_reentrant: false
hub_model_id: open-r1/R1-Zero-Qwen-32B-Math
hub_model_revision: v00.00
hub_strategy: every_save
Expand Down
3 changes: 1 addition & 2 deletionssetup.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -67,8 +67,7 @@
"sentencepiece>=0.1.99",
"torch==2.6.0",
"transformers==4.51.2",
"trl @ git+https://github.com/huggingface/trl.git@294f35bf3c0043d3ee6b9b5d22385e5736f6ce9e", # Generate once per batch: https://github.com/huggingface/trl/pull/3283
"vllm==0.8.3",
"trl[vllm] @ git+https://github.com/huggingface/trl.git@294f35bf3c0043d3ee6b9b5d22385e5736f6ce9e", # Generate once per batch: https://github.com/huggingface/trl/pull/3283
"wandb>=0.19.1",
]

Expand Down
69 changes: 32 additions & 37 deletionsslurm/train.slurm
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -5,7 +5,7 @@
#SBATCH --gres=gpu:8
#SBATCH --partition=hopper-prod # Adjust this for your cluster
#SBATCH --output=./logs/%x-%j.out
#SBATCH --err=./logs/%x-%j.err
#SBATCH --error=./logs/%x-%j.err
#SBATCH --requeue

# Specific configuration optimized for the Hugging Face Compute Cluster
Expand All@@ -14,30 +14,15 @@ set -x -e

source ~/.bashrc
source openr1/bin/activate
echo "START TIME: $(date)"

MODEL=$1
TASK=$2
CONFIG_SUFFIX=$3
ACCELERATOR=$4
OPTIONAL_ARGS=$5
CONFIG_FILE=recipes/$MODEL/$TASK/config_$CONFIG_SUFFIX.yaml

# Special parsing to align GAS on accelerate and training configs
GRAD_ACC_STEPS=$(grep 'gradient_accumulation_steps' $CONFIG_FILE | awk '{print $2}')

# Split the string into individual arguments
IFS=' ' read -ra ARGS <<< "$OPTIONAL_ARGS"
# Loop through the arguments and find the one with "--gradient_accumulation_steps"
for arg in "${ARGS[@]}"; do
if [[ "$arg" == "--gradient_accumulation_steps="* ]]; then
# Extract the value after the equals sign
GRAD_ACC_STEPS="${arg#*=}"
break # Exit the loop once we find the desired argument
fi
done

echo "Gradient accumulation steps: $GRAD_ACC_STEPS"

MODEL=$(grep 'model_name_or_path:' $CONFIG_FILE | awk '{print $2}')
REVISION=$(grep 'model_revision:' $CONFIG_FILE | head -n 1 | awk '{print $2}')

Expand All@@ -54,17 +39,15 @@ USE_VLLM="false"
if [[ -f "$CONFIG_FILE" ]] && grep -qE '^\s*use_vllm:\s*true' "$CONFIG_FILE"; then
USE_VLLM="true"
fi
#If usingvLLM we need to reserve one node for the vLLM server and retain the rest for training
#if usingvllm
if [[ "$USE_VLLM" == "true" ]]; then
TRAIN_NODES=("${NODELIST[@]:0:$((NUM_NODES - 1))}")
VLLM_NODE=${NODELIST[-1]} # Last node
echo "Using vLLM server on node: $VLLM_NODE"
echo "Training nodes: ${TRAIN_NODES[*]}"
TP=$(python scripts/get_tensor_parallel_size.py --model_name $MODEL --revision $REVISION --default_tp $GPUS_PER_NODE)
WORLD_SIZE=$((WORLD_SIZE - GPUS_PER_NODE))
NUM_NODES=$((NUM_NODES - 1))
echo "Reduced WORLD_SIZE: $WORLD_SIZE and NUM_NODES: $NUM_NODES"
srun --nodes=1 --ntasks=1 --nodelist=$VLLM_NODE trl vllm-serve --model $MODEL --revision $REVISION --tensor_parallel_size $TP &

OPTIONAL_ARGS="$OPTIONAL_ARGS --vllm_server_host=$VLLM_NODE"
fi

Expand All@@ -76,22 +59,34 @@ export NCCL_ASYNC_ERROR_HANDLING=1
# export NCCL_NSOCKS_PERTHREAD=1
# export CUDA_LAUNCH_BLOCKING=1

export CMD=" \
src/open_r1/$TASK.py --config $CONFIG_FILE $OPTIONAL_ARGS
"

TRAIN_NODES_CSV=$(IFS=,; echo "${TRAIN_NODES[*]}")
export LAUNCHER="HF_HUB_ENABLE_HF_TRANSFER=1 ACCELERATE_LOG_LEVEL=info TRANSFORMERS_VERBOSITY=info accelerate launch \
--config_file recipes/accelerate_configs/$ACCELERATOR.yaml \
--gradient_accumulation_steps $GRAD_ACC_STEPS \
--num_machines $NUM_NODES \
--num_processes $WORLD_SIZE \
--main_process_ip $MASTER_ADDR \
--main_process_port $MASTER_PORT \
--machine_rank $SLURM_PROCID \
--rdzv_backend=c10d \
--max_restarts 1 \
--tee 3 \
"
# srun error handling:
# --wait=60: wait 60 sec after the first task terminates before terminating all remaining tasks
# --kill-on-bad-exit=1: terminate a step if any task exits with a non-zero exit code
NODELIST=$(IFS=,; echo "${TRAIN_NODES[*]}")

srun --nodes=$NUM_NODES \
--ntasks=$NUM_NODES \
--nodelist=$TRAIN_NODES_CSV \
accelerate launch \
--config_file recipes/accelerate_configs/$ACCELERATOR.yaml \
--gradient_accumulation_steps $GRAD_ACC_STEPS \
--num_machines $NUM_NODES \
--num_processes $WORLD_SIZE \
--main_process_ip $MASTER_ADDR \
--main_process_port $MASTER_PORT \
--machine_rank $SLURM_PROCID \
--rdzv_backend=c10d \
src/open_r1/$TASK.py --config $CONFIG_FILE $OPTIONAL_ARGS
SRUN_ARGS=" \
--wait=60 \
--kill-on-bad-exit=1 \
--nodes=$NUM_NODES \
--ntasks=$NUM_NODES \
--nodelist=$NODELIST
"
clear; srun $SRUN_ARGS --jobid $SLURM_JOB_ID bash -c "$LAUNCHER $CMD" 2>&1

# wait for any background jobs (vLLM) before exiting
wait
echo "END TIME: $(date)"

[8]ページ先頭

©2009-2025 Movatter.jp