- Notifications
You must be signed in to change notification settings - Fork9
Amandab/lc setup#153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
tyler-romero wants to merge175 commits intomainChoose a base branch fromamandab/lc-setup
base:main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Uh oh!
There was an error while loading.Please reload this page.
Draft
Amandab/lc setup#153
Changes from1 commit
Commits
Show all changes
175 commits Select commitHold shift + click to select a range
3ed8215 Olmo3 long context training support
tyler-romero9a04910 Ready to test
tyler-romero3f39e04 modify workspace
tyler-romerocba833d revert workspace
tyler-romeroe22d951 without weka
tyler-romero2b413c8 fix max target sequence length
tyler-romeroaf0f6a4 generate_doc_lengths
tyler-romero4a827e6 generate_doc_lengths
tyler-romerodc2c228 Try doubling size for fun
tyler-romeroa1701f2 Update configs
tyler-romero2d3f18f Try activation checkpointing for 131k context lengith
tyler-romerofbbb854 Fix typo
tyler-romerof86e882 Configs
tyler-romero73199d0 tweaks
tyler-romerocccb7af Try 262k for yuks
tyler-romero77f1af9 262k w/ more dp sharding
tyler-romero6827649 more cp degrees
tyler-romeroe0661df Working
tyler-romero8ae35c5 Maybe 524k?
tyler-romero0742014 0.5M context length recipie works
tyler-romero0e4db63 revert datset
tyler-romero0be66c1 baseline LC config
soldni7266c87 updated config
soldnifdbfe97 fix syntax
soldni7717a7a filename
soldni32c7fb9 filename
soldni9cc7502 indentation?
soldni139f4f8 indentation?
soldni90e1145 paths
soldni1c201f6 diff workspace
soldnie84064a olmo3-mix
soldni0ae9596 easy setup
soldni5c60216 names
soldni0a412a4 description
soldnia291cca removed space
soldni9d131e9 downstream evaluators off
soldni3af0771 distinct name
soldni44d6d32 tweaking gc
soldnia2bdec0 larger BS
soldni51acf0c .
soldni666afd5 moved things
soldnife5394a readme
soldnif50d27e Merge branch 'main' into soldni/from-tyler-lc
soldnic5062ad Update README.md
soldnidebef18 commits
soldni2bbf421 more configs
soldni9490870 fixing rope
soldnidb8feae name change
soldniabd9d21 path
soldni76be382 restoring
soldni9c7c567 skipping confirmation
soldni39c31a7 skipping confirmation
soldniaed4d93 new config from 7T
soldni2ac74f1 2M -> 16M batch size
abertsch72a6be1f9 fix import for WSD
abertsch72550f118 configs to launch olmo3 data sweep
abertsch728cb7d4c swap to 1b
abertsch72a08cae9 new s2pdfs recipes
abertsch7240a3b05 folder and update priority
abertsch7296b7d8e swap to new s2pdfs
abertsch727d3a60d anti-gloo actions
abertsch72efc02c4 hardcode no async save
abertsch72f9bb20d try 8 node training instead...
abertsch72293d4ef try 16 node training instead...
abertsch72a7e47ce change save loc
abertsch728ef560e 32 node vers
abertsch7267b04c6 restore trainer state from save folder
epwalshba5781b 8 nodes
abertsch72735aeb2 resumption
abertsch7264e109d remove group id
abertsch729f37135 turn off overwriting
abertsch7242f249f silly fix
abertsch72bd861e2 even sillier fix
abertsch72b051b99 also mod workdir
abertsch720d02fb7 32 nodes
abertsch7229a982a modded the wrong one
abertsch723426ede data loader single threaded
abertsch7283607de 8 nodes
abertsch7222b822e Merge branch 'main' of github.com:allenai/olmo-cookbook into amandab/…
abertsch726338a11 16 nodes on new setup
abertsch72c1c35f2 add dolmino mix run
abertsch7217fd19b untab data
abertsch728129d46 swap to version with gloo fixes
abertsch721c2f428 support for rope-scaling strategies
tyler-romero1321028 post-SFT ckpts
abertsch728ab6079 post-SFT ckpts need LR specified
abertsch7235fe8eb turning off anneal
abertsch72f7c0039 2T runs
abertsch724122490 fixing naming
abertsch721eab8e6 fix paths
abertsch72713ca9c add warmup
abertsch72fd4a29f change weight decay to 0.1, hardcode alpha_f to 0 instead of 0.1
abertsch721ed8704 dirklike recipe
abertsch729697b6a revert to cookbook tokenizer naming
abertsch726a3e6b2 update optim settings for each
abertsch72d1f0270 yolo run
abertsch72b129f60 model path not model_and_optim path
abertsch72579902e silly test-- nearby checkpoint
abertsch729b479ec fix the passthrough of yolo full
abertsch7274bd148 test turning off annealing
abertsch72d1a5251 first test version, wrong ckpt
abertsch72f6fe52a correct ckpt
abertsch7232c329d postanneal recipes
abertsch726039303 rerun for olmo 2.5
abertsch7271cd3a2 postanneal for olmo3
abertsch720ea3770 move into my folder
abertsch72b87e777 olmo25
soldni9ba6379 rename
abertsch7296070c2 support tp
tyler-romero8534eac olmo29 scaling strats
tyler-romero18a3f5a tp support
tyler-romeroe785ec5 olmo2 configs
abertsch726e0d070 Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…
abertsch723efe3d8 b -> B
abertsch72e980643 olmo2.5 full attn
abertsch72dc37449 .
tyler-romeroe237743 fullattn 2.9 config
abertsch728e31f5b Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…
abertsch721615e52 modify path
abertsch723784fc0 correct paths for longdep runs
abertsch72a2be09f swap to 4 nodes
abertsch72ce5a725 olmo2 config that uses flash attn
abertsch7277d9e25 remove travel p80
abertsch725b6d2cb up to 8 nodes
abertsch7275a4317 fix spread paths
abertsch72d6d2832 fix LR, increased nodes
abertsch727a92991 8nodes
abertsch72a4fd359 pin to commit in pyproject
abertsch72445da92 140B and 280B anneals
abertsch7226d995b 20B versions, dropping 10B to 4 nodes
abertsch72f9aea0d 2.9 140B/280B configs
abertsch7236f0530 fix name on 140B
abertsch72a9a01c5 fix lr
abertsch7247a4bf2 dolmino runs
abertsch724e0becc remove train module override
abertsch721e62b5f remove SC
abertsch72273a2cd remove duplicate LC datasets
abertsch725861ef0 swap checkpoint location
abertsch72c4df7c5 fix source duplication
abertsch72f23c0c9 bump to 8 nodes
abertsch72b1594bf retrofit configs v1
abertsch723a66e91 olmo28 run
abertsch729f1ab66 full attn runs for olmo 2.9 and 2.5 at 140B
abertsch72e278c4b swa for olmo2
abertsch728ca63c8 halfcontext 2.5 LC extensions
abertsch724e4f874 gqa only run
abertsch720ffcdd7 retrofit 250B runs
abertsch724bb6626 float8 train
abertsch728198528 add llamalike extension
abertsch7297becea remove copy from end of name
abertsch726a52906 4T and 6T runs
abertsch7273ce19d reretro run
abertsch72125bcf5 use flash attn for llama 3 config
abertsch722c698b1 yarn scaling reretro
abertsch72ede8ed4 hotfix for cluster rename
abertsch7258963d9 configs using full attn for reretro
abertsch72e20f28a 32k ckpt
abertsch7289baa08 recipe for 32k reretro with fancy data
abertsch72a80ba53 long context runs
abertsch727e0e082 gqa 5T run
abertsch72d67be3c fix name and priority/num gpus
abertsch7264d72aa extension run for headnorm 140b
abertsch72df8f10c titan swap
abertsch722ec399f fix priority for titan
abertsch72cc660a5 half context runs
abertsch72a0d1c86 llama without qk norm result
abertsch72739f410 remove headwise norm from llama clone
abertsch727b6bb1d move to jupiter
abertsch725b96bba swap back to augusta
abertsch729371a95 swap to old cluster name
abertsch728e3df5c test single gpu version
abertsch724d25dfc 1 node training
abertsch72b7a3013 nccl fix
abertsch72bb39f3f 4 nodes
abertsch72092e4bb 8 nodes
abertsch72File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
retrofit configs v1
- Loading branch information
Uh oh!
There was an error while loading.Please reload this page.
commitb1594bfbd05da701f77519d9ffacd8f15e70d358
There are no files selected for viewing
Oops, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.