- Notifications
You must be signed in to change notification settings - Fork9
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
License
allenai/olmo-cookbook
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
OLMost every recipe you need to experiment with the OLMo family of models.
- Install the cookbook CLI
pip install -e .[all]
- Set up your environment
gcloud auth application-default logingcloud auth application-default set-quota-project ai2-allennlpexport GOOGLE_CLOUD_PROJECT=ai2-allennlpOptional: (Only if you are using Weka storage for token files)
export WEKA_ENDPOINT_URL=<weka-endpoint-url>export WEKA_PROFILE=WEKA
Note: Make sure you have WEKA and S3 profiles in your ~/.aws/config and ~/.aws/credentials files.
- Create a
Beakeruser account, request access to AI2 clusters, and create a Beaker user token. - Set up your workspace
olmo-cookbook prepare-user-workspace \ --workspace<workspace> \ --beaker-token<beaker-token> \ --aws-config<aws-config> \ --aws-credentials<aws-credentials> \ --wandb-api-key<wandb-api-key>
Note: Weka / R2 endpoint urls only need to be set if you are using them for storage.
Seesrc/cookbook/recipes/train-1b-1xC-dclm.yaml for an example to clone.
Note: This cookbook relies onbeaker-py under the hood and thus requires committing and pushing changes to configuration files before launching a job.
olmo-cookbook launch -c src/cookbook/recipes/train-1b-1xC-dclm.yaml- Follow the interactive prompts. A link to the
Beakerjob will be provided upon successful submission. - Monitor your training job in
wandbor theBeakerUI.
For models trained withOLMo
olmo-cookbook-eval convert \"/oe-training-default/kevinf/checkpoints/OLMo-medium/peteish7-medlr/step477000" \ -t olmo2 \ --use-beaker \ --huggingface-tokenizer allenai/dolma2-tokenizerFor models trained withOLMo-core
olmo-cookbook-eval convert \"/oe-training-default/ai2-llm/checkpoints/peteish32-anneal/OLMo2-32Bparams-5Ttokens-100Banneal/step11921" \ -t olmo-core \ --use-beaker \ --huggingface-tokenizer allenai/OLMo-2-1124-7BFor models trained withOLMo-core-v2
olmo-cookbook-eval convert \"/oe-training-default/ai2-llm/checkpoints/mattj/olmo2-1b-1xC-all-dressed-noDedup-89adf213/step12202" \ -t olmo-core-v2 \ --use-beaker🛑 🛑 🛑 🛑 🛑 🛑WARNING
If using models trained with OLMo Core v2 converted beforeMay 7, 2025,make sure to use--model-args dtype=bfloat16 to avoid NaN with vLLM.🛑 🛑 🛑 🛑 🛑 🛑
olmo-cookbook-eval evaluate \"/oe-training-default/ai2-llm/checkpoints/OLMoE/a0125/olmoe-8x1b-newhp-newds-dolmino-seed-42/step23842-hf" \ --tasks core:mc --tasks mmlu:mc --tasks mmlu:rc --tasks gen \ --priority high \ --cluster aus80g \ --num-gpus 1 \ --model-backend vllm \ --dashboard olmoe-0125Example evaluating a HuggingFace model:
olmo-cookbook-eval evaluate \ mistralai/Mistral-Small-24B-Base-2501 --tasks gen-no-jp --priority high --cluster aus80g --num-gpus 1 --model-backend vllm --dashboard peteish32
olmo-cookbook-eval results --dashboard peteish32 --tasks olmo2:int:mc
This will return a table that includes the results using internal MC tasks from olmo 2 days.You can also provide a list of--tasks to get results for specific tasks.
You can also provide a list of--models regular expressions to filter the models by name.
You can use--format json to see full results when model names are long.
You can launch any OLMo core training script using the cookbook.By default, any script insrc/scripts/train can be launched.
Here's an example of how to train a 1B model for 50B tokens on 16 GPUs on theai2/augusta-google-1 cluster.
olmo-cookbook-core launch \ -d dolmino50 \ -m OLMo2-1B \ -n 50e9T \ -i petew/olmo-core-tch260cu126-v2.0.1 \ -p urgent \ -c ai2/augusta-google-1 \ -g 16
Let's break down the command:
-d dolmino50: The data mix to use for training. This data mix is atdata/mixes/dolmino50.yaml, but you can use any path to a data mix file (i.e., a plain text file with a list on npy tokens files)-m OLMo2-1B: The model to train. This is the configurationsrc/scripts/train/OLMo2-1B.py. You can also provide a path to any training script written in OLMo-core.-n 50e9T: The number of tokens to train on (50B tokens).-i petew/olmo-core-tch260cu126-v2.0.1: The image to use for training.-p urgent: The priority of the job.-c ai2/augusta-google-1: The cluster to use for training.-g 16: The number of GPUs to use for training.
Use the--dry-run flag to print the command without launching the job; to view all available flags, runolmo-cookbook-core launch --help.
At the moment, we pin OLMo-core to commit2f66fd9, but you can override this by setting the--olmo-core-commit-hash flag.
The EC2 CLI is a tool for managing EC2 instances. We will describe its use by example.
First, you want to install the cookbook CLI.
pip install -e.Then, you can create a cluster of instances; by default, instances will bei4i.xlarge and will be tagged with the project name and owner; they will use theus-east-1 region and use your SSH key at~/.ssh/id_rsa.
Let's say you wanna create a cluster namedchipstest:
olmo-cookbook-ec2 create --name chipstest --number 5 --instance i4i.2xlarge --detach
This will create 5 instances as part of a cluster with the namechipstest; the--detach flag means that theprocess will return immediately and the instances will be created in the background.
You can check the status of the instances by listing them:
olmo-cookbook-ec2 list --name chipstest
After the instances are create, you wanna set up AWS credentials and D2TK pipeline on them. You can do this by running the following command:
olmo-cookbook-ec2 setup-d2tk --name chipstest
To run a command on all instances in the cluster, you can use the following command:
olmo-cookbook-ec2 run --name chipstest --command"echo 'Hello, world!'"But, most likely you wanna queue a bunch of jobs to run on the instances. You can do this by creating a directory with as many bash scripts as job units, and then running the following command:
olmo-cookbook-ec2 map --name chipstest --scripts-dir tmp/test_scripts
This will run all the scripts in thetmp/test_scripts directory on all the instances in the cluster.
Once you are done with the jobs, you can terminate the cluster:
olmo-cookbook-ec2 terminate --name chipstest
This will terminate all the instances in the cluster and delete the cluster.
The PMR CLI is a minimal alternative to Ray for distributed data processing on EC2 instances. It is primarily designed to work with version 2 of the Dolma toolkit.
First, install the cookbook CLI:
pip install -e .[all]
The CLI offers several commands for managing EC2 instances and executing tasks:
poormanray create --name chipstest --number 5 --instance i4i.2xlarge --detach
This will create 5 instances as part of a cluster namedchipstest. The--detach flag makes the process return immediately while instances are created in the background.
You can customize the storage configuration for your instances using the--storage-type and--storage-size options.By default, storage is not configured (and thus left to whatever AWS defaults to).
poormanray create --name chipstest --number 5 --instance i4i.2xlarge --storage-type gp3 --storage-size 100 --detach
This will create 5 instances as part of a cluster namedchipstest with 100GB of gp3 storage.
poormanray list --name chipstest --region us-east-1
poormanray setup-d2tk --name chipstest --ssh-key-path~/.ssh/id_rsapoormanray run --name chipstest --command"echo 'Hello, world!'" --ssh-key-path~/.ssh/id_rsa
You can distribute multiple scripts across your instances by creating a directory with bash scripts and using themap command:
poormanray map --name chipstest --script tmp/test_scripts --ssh-key-path~/.ssh/id_rsaThis will distribute all executable scripts in thetmp/test_scripts directory evenly across all instances in the cluster.
poormanray terminate --name chipstest --region us-east-1
This will terminate all instances in the cluster.
By default, instances will bei4i.xlarge, tagged with the project name and owner, use theus-east-1 region, and use your SSH key at~/.ssh/id_rsa.
All PMR CLI commands support the following options:
| Option | Short | Default | Description |
|---|---|---|---|
--name | -n | (required) | Cluster name |
--instance-type | -t | i4i.xlarge | EC2 instance type |
--number | -N | 1 | Number of instances to create |
--region | -r | us-east-1 | AWS region |
--timeout | -T | None | Command timeout in seconds |
--owner | -o | Current user | Owner name for tagging instances |
--instance-id | -i | None | Specific instance ID(s) to target (can be used multiple times) |
--ssh-key-path | -k | ~/.ssh/id_rsa | Path to SSH private key |
--ami-id | -a | None | Custom AMI ID (defaults to latest Amazon Linux 2) |
--detach/--no-detach | -d/-nd | --no-detach | Whether to detach after command execution |
--command | -c | None | Command to execute on instances |
--script | -s | None | Path to script file or directory to execute |
Note that you can provide either--command or--script, but not both. When using--script with a directory path, all executable files in that directory will be distributed across the instances.
About
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.