- Notifications
You must be signed in to change notification settings - Fork9
Undfined/continual pt#146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
undfined wants to merge148 commits intomainChoose a base branch fromundfined/continual-pt
base:main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Uh oh!
There was an error while loading.Please reload this page.
Draft
Changes from1 commit
Commits
Show all changes
148 commits Select commitHold shift + click to select a range
2740d49 add round1 anneal configs
aetting39da9f2 Pin to swafix and add smoketest config
undfinede765967 Trainer config updates
undfined14233fa More fixes
undfinedd753873 oops
undfined528bb19 More config tweaks
undfined610a9de Imports
undfined2a3e8cb Fix for WSD class bug
undfined068b776 Match ac_config from swafix
undfined613ab1a Match sliding window changes
undfined4bbb982 More shenans
undfined65eb9e4 Typo
undfineda1ac0da comment
undfined1353c5a Can't load state with new dataset
undfined7c523ee OOM
undfinedb9df024 olmo3 settings and new paths
aetting09f3a38 resources and web name
aetting13ef591 new web paths
aetting7b2afe8 Use improved scheduler branch
undfined1b35a4d Merge branch 'undfined/swafix-core' into olmo3-anneals
aetting335283b update round1 anneal paths (missing two)
aetting8d834bb update example configs
aetting657084e add web paths
aettingc167043 Merge branch 'undfined/swafix-core' into olmo3-anneals
aettingde07e7a consistency updates
aetting71cf247 Use new dolmino math and update weights
undfined6104825 Allow repetitions in hqweb
undfined953ae4a Not enough tokens for dolmino
undfinede614cae Adjust reddit target
undfined70684e2 Try double rbz
undfined95859f9 oops
undfinedec6a6bc Back to 8192 rbz
undfined106f175 try with float8
undfined035d19a Newer torch
undfined36cc983 dp tweaks
undfined2520bdb match pretrain
undfinedaee96c2 More tweaks for large job
undfined8b085bd baseline dolmino anneal config
aettingd897712 paths bucket and format fix
aetting95ef206 mj anneals rd1
4339fdd Tweaks for mj anneals
undfineda0544c2 Rejiggered ratios for OMR rewrites
e3b8261 merge
a562e4c restore trainer state from save folder
epwalsh8520a13 Merge pull request #127 from allenai/epwalsh/olmo3-anneals
aettingdba258a update example priority
aetting2979e40 restore model_and_optim
aettinga209758 add lr-test-config
aettingba57e1f Added a bunch of nanoanneals
e595476 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
7983868 fixed typo
9fe0df4 added submodular dolmino math curves
70a1a94 path format consistency
aettinga11ea98 fix name
aetting47211e3 added gs->weka tool
14d46c9 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
148b539 idk this is some luca thing maybe\?
6cbe734 Added convert from config
c78dc1e diff convert
acbd449 tyler wanted me to do this, idk
fd233ad Merge branch 'main' into olmo3-anneals
de49cd6 Adds v2 hq fim stackedu microanneal
undfinedcd707d9 convert with custom branching
14b95d3 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
90ac849 Adds v2++ hq fim stackedu microanneal
undfined1670251 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
undfinedba62835 add 7T anneals
aettingb41ca95 step update
aettingd06c704 fix run names
aetting12befcd add ae microanneals
aetting6d42d1d bump up rank microbatch size
aettinge7e55ac Added eval script for midtraining
40408cf uncomment
5ed2c7b merge
d849517 Update README.md
revbucket9dc008a add testrun
aettingd5f4d88 Rename olmo2 anneals and add olmo3-fim-code configs
undfinedb1dd917 Added 'missing eval' stuff
f0eeaa6 Too many workers counting tokens
undfined8eb33fc Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
c471589 merged davidhs backfill stuff
8645829 increment eval version
0657fb0 Wrong weight for hqweb
undfined29a3653 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
undfined38aeeea add highthresh diverse qa config
aetting6fd7250 Added mjnewmath-bestof
aba11c2 add wip anneal round 2 config
aetting2144071 added kodkode mjicroanneals
3f32ad1 Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
079cdf5 update reasoning and math ratios
aetting8ae28c7 add round2 8T
aetting371e551 8 nodes
aettinged104ab update reasoning paths and run names
aetting687bce5 updated code path
aettingfc81e62 updated paths
soldni830c5c1 adjusting ratios
soldni35d7cde merged main
d2cd5a9 add follow-up reasoning microanneals
aettinge8dc411 Adds 10b anneal with 35/30/35 web/code/etc ratios
undfined984e751 add reddit lowthresh663 microanneal
aetting5773e19 added megamath-web-pro-max anneals
7182650 more reddit lowthresh microanneals
aetting647294d fix nonmc name
aettingbee78f9 cleaned up
0b90f6b Merge branch 'olmo3-anneals' of github.com:allenai/olmo-cookbook into…
b108729 remove model_and_optim in load_path
aettingc18c541 add round 3 macroanneal configs
aettingbc2c081 add 12T configs
aetting31e480b lowthresh mcplusfull
aettingbcf3459 added rewrite checks
aa985bf convert from config hashes updated
d187de5 Added swallow anneals
e66c626 lowthresh add context v1
aetting0825df8 fix path
aetting68c6fee add 200B round3 config
aetting0401ec8 more web paths
aetting6ef5068 adjust math ratios and code path
aettinga9de822 adjust reasoning ratios
aetting795e26d adjust reasoning ratios
aetting2cf4c9c update name
aettingb12faae 16 nodes
aetting59d42fc add omr fullthoughts baseline
aettingfafdcc5 psgqa microanneal
aettingfa8d330 psgqa microanneal name
aettingda8433d psgqa microanneal name
aetting30d25b3 add no reasoning no instruct
aettingfbf6ff9 add no reasoning no instruct
aettingb50797f add nodes
aettinga5d0d3b fix dolmino ratio
aetting65f7ebf add sub8k llamanemotron
aettingcc1f690 Added check of swallowmatt stuff
d79785a bumped nodes on fm4p
c71cd1a Added megamatt test anneals
05249d8 changed names
78bb0e4 some more swallowmath diversity experiments
5193269 correct token counts
ba1f52d Adds changes to support continual pretrain of olmo3
undfinedc6f53f2 Fix config
undfined33a52e2 Try with diff path
undfinede24a6d6 Can't load state with new dataset
undfined8703207 OOM fix maybe
undfined620e3ef Div by 0 not gud
undfined98352fc Must set warmup
undfineda814101 Tweaks
undfined336347e fix model validation
undfinedb5e84a7 Try decay for a single token
undfined14a8dd4 set lr floor for decay
undfinedb048115 duh
undfinedFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
psgqa microanneal
- Loading branch information
Uh oh!
There was an error while loading.Please reload this page.
commitfafdcc5bdcee67c2c30d4531812f4b45d17f5664
There are no files selected for viewing
33 changes: 33 additions & 0 deletionssrc/cookbook/recipes/olmo3-midtraining/ae_microanneals/webv18-psgqav1-10B-microanneal.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| name: "webv19-redditv1-10B-olmo3-microanneal" | ||
| description: "OLMo3 7b 10B web v18 + reddit v1 microanneal" | ||
| budget: "ai2/oe-base" | ||
| workspace: "ai2/olmo-3-microanneals" | ||
| nodes: 4 | ||
| gpus: 8 | ||
| preemptible: true | ||
| max_tokens: 10_000_000_000 | ||
| global_batch_size: 2097152 | ||
| sequence_length: 8192 | ||
| seed: 1337 | ||
| model: "olmo2_7B_swafix" | ||
| tokenizer: "dolma2" | ||
| priority: high | ||
| cluster: ai2/augusta-google-1 | ||
| rank_microbatch_size: 16384 | ||
| scheduler_type: linear | ||
| warmup_steps: 0 | ||
| activation_checkpointing: true | ||
| annealing: | ||
| enabled: true | ||
| load_path: gs://ai2-llm/checkpoints/OLMo3-7B-swafix/step289000 | ||
| load_state: false | ||
| dataset: | ||
| sources: | ||
| - name: web | ||
| target_ratio: 0.58 | ||
| paths: | ||
| - s3://ai2-llm/preprocessed/cc_all_dressed/all_dressed_v3_subsamples/midtrain_pools/6B/allenai/dolma2-tokenizer/*.npy | ||
| - name: psgqa | ||
| target_ratio: 0.42 | ||
| paths: | ||
| - s3://ai2-llm/preprocessed/wiki_psgqa_rewrites/psgqa_rewrites_v1/allenai/dolma2-tokenizer/*.npy |
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.