- Notifications
You must be signed in to change notification settings - Fork11.2k
Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4988
3714c3e
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
Assets25
- 303 MB
2025-03-28T21:56:29Z - 373 MB
2025-03-28T21:56:42Z - 24.7 MB
2025-03-28T21:56:59Z - 26.4 MB
2025-03-28T21:57:01Z - 26.9 MB
2025-03-28T21:57:03Z - 33 MB
2025-03-28T21:57:04Z - 28.5 MB
2025-03-28T21:57:07Z - 17.2 MB
2025-03-28T21:57:08Z - 17.2 MB
2025-03-28T21:57:10Z - 17.3 MB
2025-03-28T21:57:12Z 2025-03-28T21:13:02Z 2025-03-28T21:13:02Z - Loading
b4987
b4ae508
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
metal : improve FA + improve MoE (#12612)* ggml : FA with different K, V head sizes (CPU)ggml-ci* metal : add FA with HS=192* metal : extend FA to support different K and V head sizesggml-ci* metal : add FA vector kernels for heads K 192 and V 128ggml-ci* ggml : restrict op on other backends to equal head sizesggml-ci* metal : optimize FA-vec kernelggml-ci* metal : FA remove mq registers* metal : improve MoE mul_mat_id conditionggml-ci* metal : fix comments + remove unnecessary additionggml-ci* metal : avoid too much shared memory usage with mul_mat_idggml-ci
Assets26
b4986
b86f600
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
vulkan: fix coopmat shader generation when cross-compiling (#12272)* vulkan: fix coopmat shader generation when cross-compilingPreviously the status of coopmat{,2} support isn't passed to thevulkan-shaders-gen project building on the host, which leads to buildfailure because of the cross-compiling code expecting coopmat{,2}shaders that didn't get generated.Fix this by passing the coopmat{,2} support status to vulkan-shaderssubproject.Signed-off-by: Icenowy Zheng <uwu@icenowy.me>* Only call coop-mat shaders once* Fix whitespace---------Signed-off-by: Icenowy Zheng <uwu@icenowy.me>Co-authored-by: bandoti <141645996+bandoti@users.noreply.github.com>
Assets25
b4985
dd373dd
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
llama: fix error on bad grammar (#12628)
Assets25
b4984
5d01670
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
server : include speculative decoding stats when timings_per_token is…
Assets26
1 person reacted
b4982
1373176
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
llamafile : ppc64le GEMV forwarding for FP32. (#12594)This patch enables usage of MMA when one of thedimensions of the matrix(ie either M or N) is 1. Thisis useful in case of token generation where N < 2.The concept of 'GEMV Forwarding' is used where when oneof the matrix has a single row/column, the elements arebroadcasted, instead of using packing routine to prepackthe matrix elements.This change results in 5% - 15% improvement in totalspeed(ie all tokens/total time), across various batchsizes. This is in comparision with the correspondingdot product implementation.The patch is tested with FP32 models of Meta-Lllama-3-8B,Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Assets26
b4981
ab6ab8f
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
rpc : send hash when tensor data is above some fixed threshold (#12496)* rpc : send hash when tensor data is above some fixed thresholdref #10095* rpc : put cache under $HOME/.cache/llama.cpp* try to fix win32 build* another try to fix win32 build* remove llama as dependency
Assets26
1 person reacted
b4980
2099a9d
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
server : Support listening on a unix socket (#12613)* server : Bump cpp-httplib to include AF_UNIX windows supportSigned-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>* server : Allow running the server example on a unix socketSigned-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>---------Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>
Assets26
b4978
5dec47d
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600)* opencl: add `im2col`* opencl: add `gelu_quick`* opencl: add mrope* opencl: add vision rope
Assets26
1 person reacted
b4977
f125b8d
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
llama : add PLM GGUF Conversion & Inference Support (#12457)* add edgellm model arch[conversation feature doesn't work]* remove output.weight layer for edgellm arch* [Model] update the name of the model* update the name of model arch in convert gguf* [Model] Refarctor the model arch into llama-model* [Bug] Fix the bug in create attn kv* [Code] Fix editorconfig erros* [Code] Remove Trailing whitespace* [Code] Remove Trailing whitespace* [Code] Change the order of model arch in list* [Code] Fix flake8 Lint errors* Remove trailing white space* [Code] Remove call in model arch