allenai/olmo-cookbookPublic

NotificationsYou must be signed in to change notification settings
Fork9
Star52

Amandab/lc setup#153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft

tyler-romero wants to merge175 commits intomain

base:main

Choose a base branch

fromamandab/lc-setup

Draft

Amandab/lc setup#153

Show file tree

Hide file tree

Changes from1 commit

Commits

Show all changes

175 commits

Select commitHold shift + click to select a range

3ed8215

Olmo3 long context training support

tyler-romeroJun 30, 2025

9a04910

Ready to test

tyler-romeroJul 2, 2025

3f39e04

modify workspace

tyler-romeroJul 2, 2025

cba833d

revert workspace

tyler-romeroJul 2, 2025

e22d951

without weka

tyler-romeroJul 2, 2025

2b413c8

fix max target sequence length

tyler-romeroJul 2, 2025

af0f6a4

generate_doc_lengths

tyler-romeroJul 2, 2025

4a827e6

generate_doc_lengths

tyler-romeroJul 2, 2025

dc2c228

Try doubling size for fun

tyler-romeroJul 2, 2025

a1701f2

Update configs

tyler-romeroJul 2, 2025

2d3f18f

Try activation checkpointing for 131k context lengith

tyler-romeroJul 2, 2025

fbbb854

Fix typo

tyler-romeroJul 2, 2025

f86e882

Configs

tyler-romeroJul 2, 2025

73199d0

tweaks

tyler-romeroJul 2, 2025

cccb7af

Try 262k for yuks

tyler-romeroJul 2, 2025

77f1af9

262k w/ more dp sharding

tyler-romeroJul 2, 2025

6827649

more cp degrees

tyler-romeroJul 3, 2025

e0661df

Working

tyler-romeroJul 3, 2025

8ae35c5

Maybe 524k?

tyler-romeroJul 3, 2025

0742014

0.5M context length recipie works

tyler-romeroJul 3, 2025

0e4db63

revert datset

tyler-romeroJul 3, 2025

0be66c1

7266c87

fdbfe97

7717a7a

32c7fb9

9cc7502

139f4f8

90e1145

1c201f6

e84064a

0ae9596

5c60216

0a412a4

a291cca

9d131e9

downstream evaluators off

3af0771

44d6d32

a2bdec0

51acf0c

666afd5

fe5394a

f50d27e

Merge branch 'main' into soldni/from-tyler-lc

c5062ad

debef18

2bbf421

9490870

db8feae

abd9d21

76be382

9c7c567

skipping confirmation

soldniJul 11, 2025

39c31a7

skipping confirmation

aed4d93

2ac74f1

abertsch72Jul 11, 2025

a6be1f9

fix import for WSD

abertsch72Jul 11, 2025

550f118

configs to launch olmo3 data sweep

abertsch72Jul 11, 2025

8cb7d4c

swap to 1b

abertsch72Jul 25, 2025

a08cae9

new s2pdfs recipes

abertsch72Jul 26, 2025

40a3b05

folder and update priority

abertsch72Jul 26, 2025

96b7d8e

swap to new s2pdfs

abertsch72Jul 26, 2025

7d3a60d

anti-gloo actions

abertsch72Jul 26, 2025

efc02c4

hardcode no async save

abertsch72Jul 26, 2025

f9bb20d

try 8 node training instead...

abertsch72Jul 26, 2025

293d4ef

try 16 node training instead...

abertsch72Jul 26, 2025

a7e47ce

change save loc

abertsch72Jul 26, 2025

8ef560e

32 node vers

abertsch72Jul 27, 2025

67b04c6

restore trainer state from save folder

epwalshJul 3, 2025

ba5781b

8 nodes

abertsch72Jul 27, 2025

735aeb2

resumption

abertsch72Jul 27, 2025

64e109d

remove group id

abertsch72Jul 27, 2025

9f37135

turn off overwriting

abertsch72Jul 27, 2025

42f249f

silly fix

abertsch72Jul 27, 2025

bd861e2

even sillier fix

abertsch72Jul 27, 2025

b051b99

also mod workdir

abertsch72Jul 27, 2025

0d02fb7

32 nodes

abertsch72Jul 27, 2025

29a982a

modded the wrong one

abertsch72Jul 27, 2025

3426ede

data loader single threaded

abertsch72Jul 27, 2025

83607de

8 nodes

abertsch72Jul 27, 2025

22b822e

Merge branch 'main' of github.com:allenai/olmo-cookbook into amandab/…

abertsch72Jul 28, 2025

6338a11

16 nodes on new setup

abertsch72Jul 28, 2025

c1c35f2

add dolmino mix run

abertsch72Jul 28, 2025

17fd19b

untab data

abertsch72Jul 28, 2025

8129d46

swap to version with gloo fixes

abertsch72Jul 29, 2025

1c2f428

support for rope-scaling strategies

tyler-romeroJul 29, 2025

1321028

post-SFT ckpts

abertsch72Aug 5, 2025

8ab6079

post-SFT ckpts need LR specified

abertsch72Aug 5, 2025

35fe8eb

turning off anneal

abertsch72Aug 5, 2025

f7c0039

2T runs

abertsch72Aug 6, 2025

4122490

fixing naming

abertsch72Aug 6, 2025

1eab8e6

fix paths

abertsch72Aug 6, 2025

713ca9c

add warmup

abertsch72Aug 6, 2025

fd4a29f

change weight decay to 0.1, hardcode alpha_f to 0 instead of 0.1

abertsch72Aug 8, 2025

1ed8704

dirklike recipe

abertsch72Aug 8, 2025

9697b6a

revert to cookbook tokenizer naming

abertsch72Aug 8, 2025

6a3e6b2

update optim settings for each

abertsch72Aug 9, 2025

d1f0270

yolo run

abertsch72Aug 9, 2025

b129f60

model path not model_and_optim path

abertsch72Aug 9, 2025

579902e

silly test-- nearby checkpoint

abertsch72Aug 9, 2025

9b479ec

fix the passthrough of yolo full

abertsch72Aug 9, 2025

74bd148

test turning off annealing

abertsch72Aug 9, 2025

d1a5251

first test version, wrong ckpt

abertsch72Aug 9, 2025

f6fe52a

correct ckpt

abertsch72Aug 9, 2025

32c329d

postanneal recipes

abertsch72Aug 9, 2025

6039303

rerun for olmo 2.5

abertsch72Aug 9, 2025

71cd3a2

postanneal for olmo3

abertsch72Aug 9, 2025

0ea3770

move into my folder

abertsch72Aug 9, 2025

b87e777

olmo25

soldniAug 6, 2025

9ba6379

rename

abertsch72Aug 9, 2025

96070c2

support tp

tyler-romeroAug 10, 2025

8534eac

olmo29 scaling strats

tyler-romeroAug 10, 2025

18a3f5a

tp support

tyler-romeroAug 10, 2025

e785ec5

olmo2 configs

abertsch72Aug 10, 2025

6e0d070

Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…

abertsch72Aug 10, 2025

3efe3d8

b -> B

abertsch72Aug 10, 2025

e980643

olmo2.5 full attn

abertsch72Aug 10, 2025

dc37449

tyler-romeroAug 10, 2025

e237743

fullattn 2.9 config

abertsch72Aug 10, 2025

8e31f5b

Merge branch 'amandab/lc-setup' of github.com:allenai/olmo-cookbook i…

abertsch72Aug 10, 2025

1615e52

modify path

abertsch72Aug 10, 2025

3784fc0

correct paths for longdep runs

abertsch72Aug 11, 2025

a2be09f

swap to 4 nodes

abertsch72Aug 11, 2025

ce5a725

olmo2 config that uses flash attn

abertsch72Aug 11, 2025

77d9e25

remove travel p80

abertsch72Aug 11, 2025

5b6d2cb

up to 8 nodes

abertsch72Aug 11, 2025

75a4317

fix spread paths

abertsch72Aug 11, 2025

d6d2832

fix LR, increased nodes

abertsch72Aug 13, 2025

7a92991

8nodes

abertsch72Aug 13, 2025

a4fd359

pin to commit in pyproject

abertsch72Aug 13, 2025

445da92

140B and 280B anneals

abertsch72Aug 13, 2025

26d995b

20B versions, dropping 10B to 4 nodes

abertsch72Aug 13, 2025

f9aea0d

2.9 140B/280B configs

abertsch72Aug 16, 2025

36f0530

fix name on 140B

abertsch72Aug 16, 2025

a9a01c5

fix lr

abertsch72Aug 16, 2025

47a4bf2

dolmino runs

abertsch72Aug 16, 2025

4e0becc

remove train module override

abertsch72Aug 16, 2025

1e62b5f

remove SC

abertsch72Aug 16, 2025

273a2cd

remove duplicate LC datasets

abertsch72Aug 16, 2025

5861ef0

swap checkpoint location

abertsch72Aug 18, 2025

c4df7c5

fix source duplication

abertsch72Aug 18, 2025

f23c0c9

bump to 8 nodes

abertsch72Aug 18, 2025

b1594bf

retrofit configs v1

abertsch72Aug 22, 2025

3a66e91

olmo28 run

abertsch72Aug 23, 2025

9f1ab66

full attn runs for olmo 2.9 and 2.5 at 140B

abertsch72Aug 27, 2025

e278c4b

swa for olmo2

abertsch72Aug 27, 2025

8ca63c8

halfcontext 2.5 LC extensions

abertsch72Aug 27, 2025

4e4f874

gqa only run

abertsch72Aug 28, 2025

0ffcdd7

retrofit 250B runs

abertsch72Aug 29, 2025

4bb6626

float8 train

abertsch72Aug 29, 2025

8198528

add llamalike extension

abertsch72Sep 1, 2025

97becea

remove copy from end of name

abertsch72Sep 1, 2025

6a52906

4T and 6T runs

abertsch72Sep 1, 2025

73ce19d

reretro run

abertsch72Sep 5, 2025

125bcf5

use flash attn for llama 3 config

abertsch72Sep 7, 2025

2c698b1

yarn scaling reretro

abertsch72Sep 8, 2025

ede8ed4

hotfix for cluster rename

abertsch72Sep 9, 2025

58963d9

configs using full attn for reretro

abertsch72Sep 9, 2025

e20f28a

32k ckpt

abertsch72Sep 12, 2025

89baa08

recipe for 32k reretro with fancy data

abertsch72Sep 12, 2025

a80ba53

long context runs

abertsch72Sep 12, 2025

7e0e082

gqa 5T run

abertsch72Oct 8, 2025

d67be3c

fix name and priority/num gpus

abertsch72Oct 8, 2025

64d72aa

extension run for headnorm 140b

abertsch72Oct 13, 2025

df8f10c

titan swap

abertsch72Oct 13, 2025

2ec399f

fix priority for titan

abertsch72Oct 13, 2025

cc660a5

half context runs

abertsch72Oct 13, 2025

a0d1c86

llama without qk norm result

abertsch72Oct 31, 2025

739f410

remove headwise norm from llama clone

abertsch72Oct 31, 2025

7b6bb1d

move to jupiter

abertsch72Oct 31, 2025

5b96bba

swap back to augusta

abertsch72Nov 4, 2025

9371a95

swap to old cluster name

abertsch72Nov 6, 2025

8e3df5c

test single gpu version

abertsch72Nov 6, 2025

4d25dfc

1 node training

abertsch72Nov 6, 2025

b7a3013

nccl fix

abertsch72Nov 6, 2025

bb39f3f

4 nodes

abertsch72Nov 7, 2025

092e4bb

8 nodes

abertsch72Nov 7, 2025

File filter

Filter by extension

Conversations

Failed to load comments.

Jump to

Jump to file

Failed to load files.

Diff view

Previous commit

Next commit

retrofit configs v1

Loading branch information

abertsch72 committedAug 22, 2025

commitb1594bfbd05da701f77519d9ffacd8f15e70d358

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Amandab/lc setup#153

Are you sure you want to change the base?

Uh oh!

Amandab/lc setup#153

Filter by extension

Uh oh!

Uh oh!

Diff view

Diff view

Uh oh!

There are no files selected for viewing

Uh oh!