- Notifications
You must be signed in to change notification settings - Fork9
Olmo3 long context training support#125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
left one comment about dependencies... otherwise LGTM. We probably shouldnt merge into main until swafix in?
| ] | ||
| all = [ | ||
| "ai2-olmo-core @ git+https://github.com/allenai/OLMo-core.git@c779ca546cc3194e73e7491aaefcdffbed042c65", | ||
| "ai2-olmo-core @ git+https://github.com/allenai/OLMo-core.git@tylerr/olmo3-scripts-swafix-foreachopt", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
@undfined how have you been handling this for olmo3?
@tyler-romero do u think is gonna get merged to main soon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
if not, we can make a new set op optional dependencies called "olmo3-lc-temp" or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We're using the same olmo-core branch in theolmo3-anneals base branch for mid-training
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
ok so we should keep this as a branch for now,@undfined ,right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yep, that will be easiest in the case we want to merge main into our feature branches.
Uh oh!
There was an error while loading.Please reload this page.
Support Olmo3 model arch + tools for long context training.
Provides three example configs (that still need to be tweaked with actual lc hyperparameters and data mixes):
For reference our best 8k CL pretraining config runs at 12.9k TPS/device