Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Amandab/lc setup#153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
tyler-romero wants to merge175 commits intomain
base:main
Choose a base branch
Loading
fromamandab/lc-setup
Draft
Show file tree
Hide file tree
Changes from1 commit
Commits
Show all changes
175 commits
Select commitHold shift + click to select a range
3ed8215
Olmo3 long context training support
tyler-romeroJun 30, 2025
9a04910
Ready to test
tyler-romeroJul 2, 2025
3f39e04
modify workspace
tyler-romeroJul 2, 2025
cba833d
revert workspace
tyler-romeroJul 2, 2025
e22d951
without weka
tyler-romeroJul 2, 2025
2b413c8
fix max target sequence length
tyler-romeroJul 2, 2025
af0f6a4
generate_doc_lengths
tyler-romeroJul 2, 2025
4a827e6
generate_doc_lengths
tyler-romeroJul 2, 2025
dc2c228
Try doubling size for fun
tyler-romeroJul 2, 2025
a1701f2
Update configs
tyler-romeroJul 2, 2025
2d3f18f
Try activation checkpointing for 131k context lengith
tyler-romeroJul 2, 2025
fbbb854
Fix typo
tyler-romeroJul 2, 2025
f86e882
Configs
tyler-romeroJul 2, 2025
73199d0
tweaks
tyler-romeroJul 2, 2025
cccb7af
Try 262k for yuks
tyler-romeroJul 2, 2025
77f1af9
262k w/ more dp sharding
tyler-romeroJul 2, 2025
6827649
more cp degrees
tyler-romeroJul 3, 2025
e0661df
Working
tyler-romeroJul 3, 2025
8ae35c5
Maybe 524k?
tyler-romeroJul 3, 2025
0742014
0.5M context length recipie works
tyler-romeroJul 3, 2025
0e4db63
revert datset
tyler-romeroJul 3, 2025
0be66c1
baseline LC config
soldniJul 9, 2025
7266c87
updated config
soldniJul 9, 2025
fdbfe97
fix syntax
soldniJul 9, 2025
7717a7a
filename
soldniJul 9, 2025
32c7fb9
filename
soldniJul 9, 2025
9cc7502
indentation?
soldniJul 9, 2025
139f4f8
indentation?
soldniJul 9, 2025
90e1145
paths
soldniJul 9, 2025
1c201f6
diff workspace
soldniJul 9, 2025
e84064a
olmo3-mix
soldniJul 9, 2025
0ae9596
easy setup
soldniJul 9, 2025
5c60216
names
soldniJul 9, 2025
0a412a4
description
soldniJul 9, 2025
a291cca
removed space
soldniJul 9, 2025
9d131e9
downstream evaluators off
soldniJul 9, 2025
3af0771
distinct name
soldniJul 9, 2025
44d6d32
tweaking gc
soldniJul 10, 2025
a2bdec0
larger BS
soldniJul 10, 2025
51acf0c
.
soldniJul 10, 2025
666afd5
moved things
soldniJul 10, 2025
fe5394a
readme
soldniJul 10, 2025
f50d27e
Merge branch 'main' into soldni/from-tyler-lc
soldniJul 10, 2025
c5062ad
Update README.md
soldniJul 10, 2025
debef18
commits
soldniJul 10, 2025
2bbf421
more configs
soldniJul 10, 2025
9490870
fixing rope
soldniJul 10, 2025
db8feae
name change
soldniJul 10, 2025
abd9d21
path
soldniJul 10, 2025
76be382
restoring
soldniJul 11, 2025
9c7c567
skipping confirmation
soldniJul 11, 2025
39c31a7
skipping confirmation
soldniJul 11, 2025
aed4d93
new config from 7T
soldniJul 11, 2025
2ac74f1
2M -> 16M batch size
abertsch72Jul 11, 2025
a6be1f9
fix import for WSD
abertsch72Jul 11, 2025
550f118
configs to launch olmo3 data sweep
abertsch72Jul 11, 2025
8cb7d4c
swap to 1b
abertsch72Jul 25, 2025
a08cae9
new s2pdfs recipes
abertsch72Jul 26, 2025
40a3b05
folder and update priority
abertsch72Jul 26, 2025
96b7d8e
swap to new s2pdfs
abertsch72Jul 26, 2025
7d3a60d
anti-gloo actions
abertsch72Jul 26, 2025
efc02c4
hardcode no async save
abertsch72Jul 26, 2025
f9bb20d
try 8 node training instead...
abertsch72Jul 26, 2025
293d4ef
try 16 node training instead...
abertsch72Jul 26, 2025
a7e47ce
change save loc
abertsch72Jul 26, 2025
8ef560e
32 node vers
abertsch72Jul 27, 2025
67b04c6
restore trainer state from save folder
epwalshJul 3, 2025
ba5781b
8 nodes
abertsch72Jul 27, 2025
735aeb2
resumption
abertsch72Jul 27, 2025
64e109d
remove group id
abertsch72Jul 27, 2025
9f37135
turn off overwriting
abertsch72Jul 27, 2025
42f249f
silly fix
abertsch72Jul 27, 2025
bd861e2
even sillier fix
abertsch72Jul 27, 2025
b051b99
also mod workdir
abertsch72Jul 27, 2025
0d02fb7
32 nodes
abertsch72Jul 27, 2025
29a982a
modded the wrong one
abertsch72Jul 27, 2025
3426ede
data loader single threaded
abertsch72Jul 27, 2025
83607de
8 nodes
abertsch72Jul 27, 2025
22b822e
Merge branch 'main' of github.com:allenai/olmo-cookbook into amandab/…
abertsch72Jul 28, 2025
6338a11
16 nodes on new setup
abertsch72Jul 28, 2025
c1c35f2
add dolmino mix run
abertsch72Jul 28, 2025
17fd19b
untab data
abertsch72Jul 28, 2025
8129d46
swap to version with gloo fixes
abertsch72Jul 29, 2025
1c2f428
support for rope-scaling strategies
tyler-romeroJul 29, 2025
1321028
post-SFT ckpts
abertsch72Aug 5, 2025
8ab6079
post-SFT ckpts need LR specified
abertsch72Aug 5, 2025
35fe8eb
turning off anneal
abertsch72Aug 5, 2025
f7c0039
2T runs
abertsch72Aug 6, 2025
4122490
fixing naming
abertsch72Aug 6, 2025
1eab8e6
fix paths
abertsch72Aug 6, 2025
713ca9c
add warmup
abertsch72Aug 6, 2025
fd4a29f
change weight decay to 0.1, hardcode alpha_f to 0 instead of 0.1
abertsch72Aug 8, 2025
1ed8704
dirklike recipe
abertsch72Aug 8, 2025
9697b6a
revert to cookbook tokenizer naming
abertsch72Aug 8, 2025
6a3e6b2
update optim settings for each
abertsch72Aug 9, 2025
d1f0270
yolo run
abertsch72Aug 9, 2025
b129f60
model path not model_and_optim path
abertsch72Aug 9, 2025
579902e
silly test-- nearby checkpoint
abertsch72Aug 9, 2025
9b479ec
fix the passthrough of yolo full
abertsch72Aug 9, 2025
74bd148
test turning off annealing
abertsch72Aug 9, 2025
d1a5251
first test version, wrong ckpt
abertsch72Aug 9, 2025
f6fe52a
correct ckpt
abertsch72Aug 9, 2025
32c329d
postanneal recipes
abertsch72Aug 9, 2025
6039303
rerun for olmo 2.5
abertsch72Aug 9, 2025
71cd3a2
postanneal for olmo3
abertsch72Aug 9, 2025
0ea3770
move into my folder
abertsch72Aug 9, 2025
b87e777
olmo25
soldniAug 6, 2025
9ba6379
rename
abertsch72Aug 9, 2025
96070c2
support tp
tyler-romeroAug 10, 2025
8534eac
olmo29 scaling strats
tyler-romeroAug 10, 2025
18a3f5a
tp support
tyler-romeroAug 10, 2025
e785ec5
olmo2 configs
abertsch72Aug 10, 2025
6e0d070
Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…
abertsch72Aug 10, 2025
3efe3d8
b -> B
abertsch72Aug 10, 2025
e980643
olmo2.5 full attn
abertsch72Aug 10, 2025
dc37449
.
tyler-romeroAug 10, 2025
e237743
fullattn 2.9 config
abertsch72Aug 10, 2025
8e31f5b
Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…
abertsch72Aug 10, 2025
1615e52
modify path
abertsch72Aug 10, 2025
3784fc0
correct paths for longdep runs
abertsch72Aug 11, 2025
a2be09f
swap to 4 nodes
abertsch72Aug 11, 2025
ce5a725
olmo2 config that uses flash attn
abertsch72Aug 11, 2025
77d9e25
remove travel p80
abertsch72Aug 11, 2025
5b6d2cb
up to 8 nodes
abertsch72Aug 11, 2025
75a4317
fix spread paths
abertsch72Aug 11, 2025
d6d2832
fix LR, increased nodes
abertsch72Aug 13, 2025
7a92991
8nodes
abertsch72Aug 13, 2025
a4fd359
pin to commit in pyproject
abertsch72Aug 13, 2025
445da92
140B and 280B anneals
abertsch72Aug 13, 2025
26d995b
20B versions, dropping 10B to 4 nodes
abertsch72Aug 13, 2025
f9aea0d
2.9 140B/280B configs
abertsch72Aug 16, 2025
36f0530
fix name on 140B
abertsch72Aug 16, 2025
a9a01c5
fix lr
abertsch72Aug 16, 2025
47a4bf2
dolmino runs
abertsch72Aug 16, 2025
4e0becc
remove train module override
abertsch72Aug 16, 2025
1e62b5f
remove SC
abertsch72Aug 16, 2025
273a2cd
remove duplicate LC datasets
abertsch72Aug 16, 2025
5861ef0
swap checkpoint location
abertsch72Aug 18, 2025
c4df7c5
fix source duplication
abertsch72Aug 18, 2025
f23c0c9
bump to 8 nodes
abertsch72Aug 18, 2025
b1594bf
retrofit configs v1
abertsch72Aug 22, 2025
3a66e91
olmo28 run
abertsch72Aug 23, 2025
9f1ab66
full attn runs for olmo 2.9 and 2.5 at 140B
abertsch72Aug 27, 2025
e278c4b
swa for olmo2
abertsch72Aug 27, 2025
8ca63c8
halfcontext 2.5 LC extensions
abertsch72Aug 27, 2025
4e4f874
gqa only run
abertsch72Aug 28, 2025
0ffcdd7
retrofit 250B runs
abertsch72Aug 29, 2025
4bb6626
float8 train
abertsch72Aug 29, 2025
8198528
add llamalike extension
abertsch72Sep 1, 2025
97becea
remove copy from end of name
abertsch72Sep 1, 2025
6a52906
4T and 6T runs
abertsch72Sep 1, 2025
73ce19d
reretro run
abertsch72Sep 5, 2025
125bcf5
use flash attn for llama 3 config
abertsch72Sep 7, 2025
2c698b1
yarn scaling reretro
abertsch72Sep 8, 2025
ede8ed4
hotfix for cluster rename
abertsch72Sep 9, 2025
58963d9
configs using full attn for reretro
abertsch72Sep 9, 2025
e20f28a
32k ckpt
abertsch72Sep 12, 2025
89baa08
recipe for 32k reretro with fancy data
abertsch72Sep 12, 2025
a80ba53
long context runs
abertsch72Sep 12, 2025
7e0e082
gqa 5T run
abertsch72Oct 8, 2025
d67be3c
fix name and priority/num gpus
abertsch72Oct 8, 2025
64d72aa
extension run for headnorm 140b
abertsch72Oct 13, 2025
df8f10c
titan swap
abertsch72Oct 13, 2025
2ec399f
fix priority for titan
abertsch72Oct 13, 2025
cc660a5
half context runs
abertsch72Oct 13, 2025
a0d1c86
llama without qk norm result
abertsch72Oct 31, 2025
739f410
remove headwise norm from llama clone
abertsch72Oct 31, 2025
7b6bb1d
move to jupiter
abertsch72Oct 31, 2025
5b96bba
swap back to augusta
abertsch72Nov 4, 2025
9371a95
swap to old cluster name
abertsch72Nov 6, 2025
8e3df5c
test single gpu version
abertsch72Nov 6, 2025
4d25dfc
1 node training
abertsch72Nov 6, 2025
b7a3013
nccl fix
abertsch72Nov 6, 2025
bb39f3f
4 nodes
abertsch72Nov 7, 2025
092e4bb
8 nodes
abertsch72Nov 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
PrevPrevious commit
NextNext commit
retrofit configs v1
  • Loading branch information
@abertsch72
abertsch72 committedAug 22, 2025
commitb1594bfbd05da701f77519d9ffacd8f15e70d358
Loading

[8]ページ先頭

©2009-2025 Movatter.jp