Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitbc6f335

Browse files
authored
Olmo 3 from scratch (#914)
* Olmo 3 from scratch* update* update* update
1 parent398b079 commitbc6f335

14 files changed

+3161
-56
lines changed

‎.github/workflows/basic-tests-linux-uv.yml‎

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ jobs:
5757
pytest ch05/11_qwen3/tests/test_qwen3_nb.py
5858
pytest ch05/12_gemma3/tests/test_gemma3_nb.py
5959
pytest ch05/12_gemma3/tests/test_gemma3_kv_nb.py
60+
pytest ch05/13_olmo3/tests/test_olmo3_nb.py
61+
pytest ch05/13_olmo3/tests/test_olmo3_kvcache_nb.py
6062
pytest ch06/01_main-chapter-code/tests.py
6163
6264
-name:Validate Selected Jupyter Notebooks (uv)

‎.gitignore‎

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,16 @@ ch05/11_qwen3/Qwen3-8B
7070
ch05/11_qwen3/Qwen3-8B-Base
7171
ch05/11_qwen3/Qwen3-32B
7272
ch05/11_qwen3/Qwen3-32B-Base
73+
ch05/12_gemma3/gemma-3-270M-it
74+
ch05/12_gemma3/gemma-3-270M
75+
ch05/13_olmo3/Olmo-3-1025-7B
76+
ch05/13_olmo3/Olmo-3-1125-32B
77+
ch05/13_olmo3/Olmo-3-7B-Instruct
78+
ch05/13_olmo3/Olmo-3-32B-Instruct
79+
ch05/13_olmo3/Olmo-3-7B-Think
80+
ch05/13_olmo3/Olmo-3-32B-Think
81+
ch05/13_olmo3/Olmo-3-7B-RLZero-IF
82+
ch05/13_olmo3/Olmo-3-32B-RLZero-IF
7383

7484
ch06/01_main-chapter-code/gpt2
7585
ch06/02_bonus_additional-experiments/gpt2

‎README.md‎

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -179,19 +179,19 @@ Several folders contain optional materials as a bonus for interested readers:
179179
-[Optimizing Hyperparameters for Pretraining](ch05/05_bonus_hparam_tuning)
180180
-[Building a User Interface to Interact With the Pretrained LLM](ch05/06_user_interface)
181181
-[Converting GPT to Llama](ch05/07_gpt_to_llama)
182-
-[Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
183-
-[Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05/11_qwen3/)
184-
-[Gemma 3 From Scratch](ch05/12_gemma3/)
185-
-[Memory-Efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
186-
-[Extending the Tiktoken BPE Tokenizer With New Tokens](ch05/09_extending-tokenizers/extend-tiktoken.ipynb)
182+
-[Memory-efficient Model Weight Loading](ch05/08_memory_efficient_weight_loading/memory-efficient-state-dict.ipynb)
183+
-[Extending the Tiktoken BPE Tokenizer with New Tokens](ch05/09_extending-tokenizers/extend-tiktoken.ipynb)
187184
-[PyTorch Performance Tips for Faster LLM Training](ch05/10_llm-training-speed)
188-
189-
-**Chapter 6: Finetuning for Classification**
190-
-[Additional Experiments Finetuning Different Layers and Using Larger Models](ch06/02_bonus_additional-experiments)
191-
-[Finetuning Different Models on 50k IMDb Movie Review Dataset](ch06/03_bonus_imdb-classification)
192-
-[Building a User Interface to Interact With the GPT-Based Spam Classifier](ch06/04_user_interface)
193-
194-
-**Chapter 7: Finetuning to Follow Instructions**
185+
-[LLM Architectures](ch05/#llm-architectures-from-scratch)
186+
-[Llama 3.2 From Scratch](ch05/07_gpt_to_llama/standalone-llama32.ipynb)
187+
-[Qwen3 Dense and Mixture-of-Experts (MoE) From Scratch](ch05/11_qwen3/)
188+
-[Gemma 3 From Scratch](ch05/12_gemma3/)
189+
-[Olmo 3 From Scratch](ch05/13_olmo3/)
190+
-**Chapter 6: Finetuning for classification**
191+
-[Additional experiments finetuning different layers and using larger models](ch06/02_bonus_additional-experiments)
192+
-[Finetuning different models on 50k IMDb movie review dataset](ch06/03_bonus_imdb-classification)
193+
-[Building a User Interface to Interact With the GPT-based Spam Classifier](ch06/04_user_interface)
194+
-**Chapter 7: Finetuning to follow instructions**
195195
-[Dataset Utilities for Finding Near Duplicates and Creating Passive Voice Entries](ch07/02_dataset-utilities)
196196
-[Evaluating Instruction Responses Using the OpenAI API and Ollama](ch07/03_model-evaluation)
197197
-[Generating a Dataset for Instruction Finetuning](ch07/05_dataset-generation/llama3-ollama.ipynb)

‎ch05/11_qwen3/standalone-qwen3-moe-plus-kvcache.ipynb‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1223,7 +1223,7 @@
12231223
"name":"python",
12241224
"nbconvert_exporter":"python",
12251225
"pygments_lexer":"ipython3",
1226-
"version":"3.13.5"
1226+
"version":"3.12.3"
12271227
}
12281228
},
12291229
"nbformat":4,

‎ch05/11_qwen3/standalone-qwen3-plus-kvcache.ipynb‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1253,7 +1253,7 @@
12531253
"name":"python",
12541254
"nbconvert_exporter":"python",
12551255
"pygments_lexer":"ipython3",
1256-
"version":"3.13.5"
1256+
"version":"3.12.3"
12571257
}
12581258
},
12591259
"nbformat":4,

‎ch05/11_qwen3/standalone-qwen3.ipynb‎

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1179,7 +1179,7 @@
11791179
"name":"python",
11801180
"nbconvert_exporter":"python",
11811181
"pygments_lexer":"ipython3",
1182-
"version":"3.13.5"
1182+
"version":"3.12.3"
11831183
}
11841184
},
11851185
"nbformat":4,

‎ch05/12_gemma3/standalone-gemma3-plus-kvcache.ipynb‎

Lines changed: 86 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,9 @@
7878
"name":"stdout",
7979
"output_type":"stream",
8080
"text": [
81-
"huggingface_hub version: 0.34.4\n",
82-
"tokenizers version: 0.21.4\n",
83-
"torch version: 2.8.0\n"
81+
"huggingface_hub version: 0.35.0\n",
82+
"tokenizers version: 0.22.1\n",
83+
"torch version: 2.9.0+cu130\n"
8484
]
8585
}
8686
],
@@ -700,9 +700,9 @@
700700
{
701701
"data": {
702702
"text/plain": [
703-
"tensor([[[ 0.7500, 0.1060, 0.4844, ..., 0.9414, 0.3984, -0.2324],\n",
704-
" [-0.3438, -0.0549, 0.8984, ..., -0.2402, 0.4570, 0.8242],\n",
705-
" [-0.2676, -0.3281, 0.4121, ..., 0.8711, -0.9648, 0.9844]]],\n",
703+
"tensor([[[ 0.7500, 0.1011, 0.4863, ..., 0.9414, 0.3984, -0.2285],\n",
704+
" [-0.3398, -0.0564, 0.9023, ..., -0.2480, 0.4551, 0.8203],\n",
705+
" [-0.2695, -0.3242, 0.4121, ..., 0.8672, -0.9688, 0.9844]]],\n",
706706
" dtype=torch.bfloat16, grad_fn=<UnsafeViewBackward0>)"
707707
]
708708
},
@@ -806,7 +806,20 @@
806806
"metadata": {
807807
"id":"31f12baf-f79b-499f-85c0-51328a6a20f5"
808808
},
809-
"outputs": [],
809+
"outputs": [
810+
{
811+
"name":"stderr",
812+
"output_type":"stream",
813+
"text": [
814+
"/home/rasbt/jupyterlab/reasoning/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning:\n",
815+
" Found GPU0 NVIDIA GB10 which is of cuda capability 12.1.\n",
816+
" Minimum and Maximum cuda capability supported by this version of PyTorch is\n",
817+
" (8.0) - (12.0)\n",
818+
"\n",
819+
" warnings.warn(\n"
820+
]
821+
}
822+
],
810823
"source": [
811824
"if torch.cuda.is_available():\n",
812825
" device = torch.device(\"cuda\")\n",
@@ -1038,6 +1051,20 @@
10381051
"outputId":"55b2f28c-142f-4698-9d23-d27456d3ed6d"
10391052
},
10401053
"outputs": [
1054+
{
1055+
"data": {
1056+
"application/vnd.jupyter.widget-view+json": {
1057+
"model_id":"3396c08eab3f4cf980023483b969a337",
1058+
"version_major":2,
1059+
"version_minor":0
1060+
},
1061+
"text/plain": [
1062+
"model.safetensors: 0%| | 0.00/536M [00:00<?, ?B/s]"
1063+
]
1064+
},
1065+
"metadata": {},
1066+
"output_type":"display_data"
1067+
},
10411068
{
10421069
"name":"stdout",
10431070
"output_type":"stream",
@@ -1131,7 +1158,22 @@
11311158
"execution_count":22,
11321159
"id":"7b6df8bc-7308-468e-93ce-2d5529ea7866",
11331160
"metadata": {},
1134-
"outputs": [],
1161+
"outputs": [
1162+
{
1163+
"data": {
1164+
"application/vnd.jupyter.widget-view+json": {
1165+
"model_id":"39b7b77c5c3448cdbd48fcde4e1b1a57",
1166+
"version_major":2,
1167+
"version_minor":0
1168+
},
1169+
"text/plain": [
1170+
"tokenizer.json: 0%| | 0.00/33.4M [00:00<?, ?B/s]"
1171+
]
1172+
},
1173+
"metadata": {},
1174+
"output_type":"display_data"
1175+
}
1176+
],
11351177
"source": [
11361178
"tokenizer_file_path = os.path.join(local_dir,\"tokenizer.json\")\n",
11371179
"if not os.path.exists(tokenizer_file_path):\n",
@@ -1195,60 +1237,80 @@
11951237
},
11961238
{
11971239
"cell_type":"code",
1198-
"execution_count":25,
1240+
"execution_count":27,
11991241
"id":"7b8401c6-e244-4cb7-9849-2ba71ce758d5",
12001242
"metadata": {
12011243
"id":"7b8401c6-e244-4cb7-9849-2ba71ce758d5"
12021244
},
12031245
"outputs": [],
12041246
"source": [
1205-
"def generate_text_basic_stream(model, token_ids, max_new_tokens,\n",
1206-
" eos_token_id=None):\n",
1207-
"\n",
1247+
"def generate_text_basic_stream(model, token_ids, max_new_tokens, eos_token_id=None, context_size=None):\n",
12081248
" model.eval()\n",
1249+
"\n",
12091250
" with torch.no_grad():\n",
1251+
" cache = KVCache(n_layers=model.cfg[\"n_layers\"])\n",
1252+
" model.reset_kv_cache()\n",
1253+
"\n",
1254+
" # Prime the cache with the initial context\n",
1255+
" logits = model(token_ids, cache=cache)\n",
1256+
"\n",
12101257
" for _ in range(max_new_tokens):\n",
1211-
" out = model(token_ids)[:, -1]\n",
1212-
" next_token = torch.argmax(out, dim=-1, keepdim=True)\n",
1258+
" next_token = torch.argmax(logits[:, -1], dim=-1, keepdim=True)\n",
12131259
"\n",
1214-
" if (eos_token_id is not None\n",
1215-
" and torch.all(next_token == eos_token_id)):\n",
1260+
" if eos_token_id is not None and torch.all(next_token == eos_token_id):\n",
12161261
" break\n",
12171262
"\n",
1218-
" yield next_token # New: Yield each token as it's generated\n",
1219-
"\n",
1220-
" token_ids = torch.cat([token_ids, next_token], dim=1)"
1263+
" yield next_token\n",
1264+
"\n",
1265+
" token_ids = torch.cat([token_ids, next_token], dim=1)\n",
1266+
"\n",
1267+
" # Feed only the new token to the model; cache handles history\n",
1268+
" logits = model(next_token, cache=cache)"
12211269
]
12221270
},
12231271
{
12241272
"cell_type":"code",
1225-
"execution_count":26,
1273+
"execution_count":28,
12261274
"id":"56c9d0cf-25e9-4375-8d5c-368fa6911fdf",
12271275
"metadata": {},
12281276
"outputs": [
12291277
{
12301278
"name":"stdout",
12311279
"output_type":"stream",
12321280
"text": [
1233-
"Large language models (LLMs) are sophisticated artificial intelligence systems that can understand, generate, and manipulate human language. They are trained on massive amounts of text data to learn patterns and relationships within language, enabling them to perform a wide range of tasks, from writing articles and answering questions to translating languages and summarizing information.\n"
1281+
"Large language models (LLMs) are sophisticated artificial intelligence systems that can understand, generate, and manipulate human language. They are trained on massive amounts of text data to learn patterns and relationships within that data, enabling them to perform a wide range of tasks, from writing articles and answering questions to translating languages and summarizing information.\n",
1282+
"\n",
1283+
"\n",
1284+
"GPU memory used: 0.96 GB\n"
12341285
]
12351286
}
12361287
],
12371288
"source": [
12381289
"input_token_ids_tensor = torch.tensor(input_token_ids, device=device).unsqueeze(0)\n",
12391290
"\n",
1291+
"\n",
1292+
"if torch.cuda.is_available():\n",
1293+
" torch.cuda.reset_peak_memory_stats()\n",
1294+
"\n",
1295+
"\n",
12401296
"for token in generate_text_basic_stream(\n",
12411297
" model=model,\n",
12421298
" token_ids=input_token_ids_tensor,\n",
1243-
" max_new_tokens=150,\n",
1299+
" max_new_tokens=500,\n",
12441300
" eos_token_id=tokenizer.encode(\"<end_of_turn>\")[-1]\n",
12451301
"):\n",
12461302
" token_id = token.squeeze(0).tolist()\n",
12471303
" print(\n",
12481304
" tokenizer.decode(token_id),\n",
12491305
" end=\"\",\n",
12501306
" flush=True\n",
1251-
" )"
1307+
" )\n",
1308+
"\n",
1309+
"if torch.cuda.is_available():\n",
1310+
" def gpu_gb(x):\n",
1311+
" return f\"{x / 1024 / 1024 / 1024:.2f} GB\"\n",
1312+
"\n",
1313+
" print(f\"\\n\\nGPU memory used: {gpu_gb(torch.cuda.max_memory_allocated())}\")"
12521314
]
12531315
},
12541316
{
@@ -1269,7 +1331,6 @@
12691331
"id":"e6edaaae-2de1-406c-8ffa-897cdfa3808c"
12701332
},
12711333
"source": [
1272-
"- Check out the [README.md](./README.md), to use this model via the `llms_from_scratch` package\n",
12731334
"- For those interested in a comprehensive guide on building a large language model from scratch and gaining a deeper understanding of its mechanics, you might like my [Build a Large Language Model (From Scratch)](http://mng.bz/orYv)\n",
12741335
"\n",
12751336
"<a href=\"http://mng.bz/orYv\"><img src=\"https://sebastianraschka.com/images/LLMs-from-scratch-images/cover-small.webp\" width=\"100px\"></a>"
@@ -1297,7 +1358,7 @@
12971358
"name":"python",
12981359
"nbconvert_exporter":"python",
12991360
"pygments_lexer":"ipython3",
1300-
"version":"3.10.16"
1361+
"version":"3.12.3"
13011362
}
13021363
},
13031364
"nbformat":4,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp