NotificationsYou must be signed in to change notification settings
Fork14.2k
Star91.6k

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

GermanAizek wants to merge3 commits intoggml-org:master

base:master

Choose a base branch

fromGermanAizek:cpu-vec-simd

Open

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

GermanAizek wants to merge3 commits intoggml-org:masterfromGermanAizek:cpu-vec-simd

+182 −18

Conversation

Copy link

Contributor

GermanAizek commentedDec 17, 2025•
edited
Loading

@am17an, hi again. Thanks a lot for the past tests, I was able to better test my changes now, how will you have free time could you test this branch?

Reference:#1595 (review)

CTest successful all, but model not found strange

14 - test-tokenizers-ggml-vocabs (Failed)Reason:14/43 Test #14: test-tokenizers-ggml-vocabs .......***Failed    0.36 secAlready up to date.main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'

ctest_all_output.txt

My hyperfine tests on NUMA Xeon 2xE5-2699:

devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128  Time (mean ± σ):     32.360 s ±  0.182 s    [User: 1150.270 s, System: 1.218 s]  Range (min … max):   32.049 s … 32.514 s    5 runs devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128  Time (mean ± σ):     28.896 s ±  0.267 s    [User: 1024.634 s, System: 1.303 s]  Range (min … max):   28.568 s … 29.183 s    5 runs

Single run (not accuracy for me):

tg128 increased, as well as in hyperfine, average execution time llama-bench fell

cpu-vec-simd

model	size	params	backend	threads	test	t/s
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	pp512	76.53 ± 0.09
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	tg128	28.97 ± 0.85

build:be23f5f (7424)

master

model	size	params	backend	threads	test	t/s
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	pp512	77.00 ± 0.10
llama 1B Q2_K - Medium	546.50 MiB	1.24 B	CPU	4	tg128	27.05 ± 0.73

build:d674212 (7421)

GermanAizek added3 commits

December 18, 2025 01:14

ggml-cpu/vec:fixggml-org#15953using multiple accum vectors for AVX…

61ca2e4

…2/AVX512Reference:ggml-org#1595 (review)

ggml-cpu/vec: calculate 4 exp and add ggml_vec_soft_max_f32 for AVX2/…

ea82649

…AVX512

ggml-cpu/vec: rewrite to SIMD with 4 element calc ggml_vec_log_soft_m…

be23f5f

…ax_f32 for all platforms

GermanAizek requested a review fromggerganov as acode owner

December 17, 2025 22:41

loci-dev mentioned this pull request

Dec 17, 2025

UPSTREAM PR #18150: ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elemsauroralabs-loci/llama.cpp#610

Open

Copy link

Collaborator

taronaeo commentedDec 18, 2025

CTest successful all, but model not found strange

Did you pull via Git LFS? Looks like the models were not downloaded via LFS

Copy link

Collaborator

am17an commentedDec 18, 2025

If I understand correctly, this is only affecting variance calculation which is only used inGGML_OP_NORM and the model you are testing (llama-1B) uses the rms norm (i.e.GGML_OP_RMS_NORM), so I wouldn't expect a change in performance

github-actionsbot added the ggmlchanges relating to the ggml tensor library for machine learning label

Dec 18, 2025

Labels

ggml

changes relating to the ggml tensor library for machine learning

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

Are you sure you want to change the base?

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

Conversation

GermanAizek commentedDec 17, 2025•
edited
Loading

Uh oh!

Uh oh!

taronaeo commentedDec 18, 2025

Uh oh!

am17an commentedDec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Movatterモバイル変換

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

Are you sure you want to change the base?

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

Conversation

GermanAizek commentedDec 17, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

taronaeo commentedDec 18, 2025

Uh oh!

am17an commentedDec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GermanAizek commentedDec 17, 2025•
edited
Loading