Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ggml-cpu: fix todo comment #15953 and SIMD-like calculate 4 elems#18150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
GermanAizek wants to merge3 commits intoggml-org:master
base:master
Choose a base branch
Loading
fromGermanAizek:cpu-vec-simd

Conversation

@GermanAizek
Copy link
Contributor

@GermanAizekGermanAizek commentedDec 17, 2025
edited
Loading

@am17an, hi again. Thanks a lot for the past tests, I was able to better test my changes now, how will you have free time could you test this branch?

Reference:#1595 (review)

CTest successful all, but model not found strange

14 - test-tokenizers-ggml-vocabs (Failed)Reason:14/43 Test #14: test-tokenizers-ggml-vocabs .......***Failed    0.36 secAlready up to date.main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/PLaMo2/ggml-vocab-plamo2.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/RWKV/ggml-vocab-rwkv-7-world.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/SPM/ggml-vocab-gemma-3.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/UGM/ggml-vocab-nomic-bert-moe.gguf'main : reading vocab from: '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'gguf_init_from_file_impl: invalid magic characters: 'vers', expected 'GGUF'llama_model_load: error loading model: llama_model_loader: failed to load model from /media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.ggufllama_model_load_from_file_impl: failed to load modelmain: error: failed to load vocab '/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/models/ggml-vocabs/WPM/ggml-vocab-jina-v2-en.gguf'

ctest_all_output.txt

My hyperfine tests on NUMA Xeon 2xE5-2699:

devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128  Time (mean ± σ):     32.360 s ±  0.182 s    [User: 1150.270 s, System: 1.218 s]  Range (min … max):   32.049 s … 32.514 s    5 runs devuan@devuan:/media/devuan/437889e5-f1cd-4f29-84a0-605eacc7cd49/GIT/llama.cpp/cmake-build-release/bin$ hyperfine --warmup 1 -r 5 "./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128"Benchmark 1: ./llama-bench -m Llama-3.2-1B-Instruct-Q2_K.gguf -p 512 -n 128  Time (mean ± σ):     28.896 s ±  0.267 s    [User: 1024.634 s, System: 1.303 s]  Range (min … max):   28.568 s … 29.183 s    5 runs

Single run (not accuracy for me):

tg128 increased, as well as in hyperfine, average execution time llama-bench fell

cpu-vec-simd

modelsizeparamsbackendthreadstestt/s
llama 1B Q2_K - Medium546.50 MiB1.24 BCPU4pp51276.53 ± 0.09
llama 1B Q2_K - Medium546.50 MiB1.24 BCPU4tg12828.97 ± 0.85

build:be23f5f (7424)

master

modelsizeparamsbackendthreadstestt/s
llama 1B Q2_K - Medium546.50 MiB1.24 BCPU4pp51277.00 ± 0.10
llama 1B Q2_K - Medium546.50 MiB1.24 BCPU4tg12827.05 ± 0.73

build:d674212 (7421)

@taronaeo
Copy link
Collaborator

CTest successful all, but model not found strange

Did you pull via Git LFS? Looks like the models were not downloaded via LFS

@am17an
Copy link
Collaborator

If I understand correctly, this is only affecting variance calculation which is only used inGGML_OP_NORM and the model you are testing (llama-1B) uses the rms norm (i.e.GGML_OP_RMS_NORM), so I wouldn't expect a change in performance

@github-actionsgithub-actionsbot added the ggmlchanges relating to the ggml tensor library for machine learning labelDec 18, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@ggerganovggerganovAwaiting requested review from ggerganovggerganov is a code owner

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

ggmlchanges relating to the ggml tensor library for machine learning

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@GermanAizek@taronaeo@am17an

[8]ページ先頭

©2009-2025 Movatter.jp