Movatterモバイル変換

ggml-org/llama.cppPublic

NotificationsYou must be signed in to change notification settings
Fork11.2k
Star77.3k

b4988

28 Mar 21:56

github-actions

b4988

3714c3e

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4988Latest

Latest

llama : fix incorrect Qwen2Moe ffn_moe_out graph callback (#12631)

Assets25

cudart-llama-bin-win-cu11.7-x64.zip
303 MB2025-03-28T21:56:29Z
cudart-llama-bin-win-cu12.4-x64.zip
373 MB2025-03-28T21:56:42Z
llama-b4988-bin-macos-arm64.zip
24.7 MB2025-03-28T21:56:59Z
llama-b4988-bin-macos-x64.zip
26.4 MB2025-03-28T21:57:01Z
llama-b4988-bin-ubuntu-arm64.zip
26.9 MB2025-03-28T21:57:03Z
llama-b4988-bin-ubuntu-vulkan-x64.zip
33 MB2025-03-28T21:57:04Z
llama-b4988-bin-ubuntu-x64.zip
28.5 MB2025-03-28T21:57:07Z
llama-b4988-bin-win-avx-x64.zip
17.2 MB2025-03-28T21:57:08Z
llama-b4988-bin-win-avx2-x64.zip
17.2 MB2025-03-28T21:57:10Z
llama-b4988-bin-win-avx512-x64.zip
17.3 MB2025-03-28T21:57:12Z
Source code(zip)
2025-03-28T21:13:02Z
Source code(tar.gz)
2025-03-28T21:13:02Z

b4987

28 Mar 19:06

github-actions

b4987

b4ae508

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4987

metal : improve FA + improve MoE (#12612)* ggml : FA with different K, V head sizes (CPU)ggml-ci* metal : add FA with HS=192* metal : extend FA to support different K and V head sizesggml-ci* metal : add FA vector kernels for heads K 192 and V 128ggml-ci* ggml : restrict op on other backends to equal head sizesggml-ci* metal : optimize FA-vec kernelggml-ci* metal : FA remove mq registers* metal : improve MoE mul_mat_id conditionggml-ci* metal : fix comments + remove unnecessary additionggml-ci* metal : avoid too much shared memory usage with mul_mat_idggml-ci

Assets26

b4986

28 Mar 18:42

github-actions

b4986

b86f600

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4986

vulkan: fix coopmat shader generation when cross-compiling (#12272)* vulkan: fix coopmat shader generation when cross-compilingPreviously the status of coopmat{,2} support isn't passed to thevulkan-shaders-gen project building on the host, which leads to buildfailure because of the cross-compiling code expecting coopmat{,2}shaders that didn't get generated.Fix this by passing the coopmat{,2} support status to vulkan-shaderssubproject.Signed-off-by: Icenowy Zheng <uwu@icenowy.me>* Only call coop-mat shaders once* Fix whitespace---------Signed-off-by: Icenowy Zheng <uwu@icenowy.me>Co-authored-by: bandoti <141645996+bandoti@users.noreply.github.com>

Assets25

b4985

28 Mar 18:02

github-actions

b4985

dd373dd

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4985

llama: fix error on bad grammar (#12628)

Assets25

b4984

28 Mar 08:59

github-actions

b4984

5d01670

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4984

server : include speculative decoding stats when timings_per_token is…

Assets26

1 person reacted

b4982

28 Mar 08:31

github-actions

b4982

1373176

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4982

llamafile : ppc64le GEMV forwarding for FP32. (#12594)This patch enables usage of MMA when one of thedimensions of the matrix(ie either M or N) is 1. Thisis useful in case of token generation where N < 2.The concept of 'GEMV Forwarding' is used where when oneof the matrix has a single row/column, the elements arebroadcasted, instead of using packing routine to prepackthe matrix elements.This change results in 5% - 15% improvement in totalspeed(ie all tokens/total time), across various batchsizes. This is in comparision with the correspondingdot product implementation.The patch is tested with FP32 models of Meta-Lllama-3-8B,Mistral-7B, Llama-2-7B-chat-hf on a IBM POWER10 machine.Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>

Assets26

b4981

28 Mar 07:03

github-actions

b4981

ab6ab8f

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4981

rpc : send hash when tensor data is above some fixed threshold (#12496)* rpc : send hash when tensor data is above some fixed thresholdref #10095* rpc : put cache under $HOME/.cache/llama.cpp* try to fix win32 build* another try to fix win32 build* remove llama as dependency

Assets26

1 person reacted

b4980

27 Mar 23:32

github-actions

b4980

2099a9d

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4980

server : Support listening on a unix socket (#12613)* server : Bump cpp-httplib to include AF_UNIX windows supportSigned-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>* server : Allow running the server example on a unix socketSigned-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>---------Signed-off-by: Piotr Stankiewicz <piotr.stankiewicz@docker.com>

Assets26

b4978

27 Mar 16:00

github-actions

b4978

5dec47d

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4978

opencl: add multi and vision rope, `gelu_quick` and `im2col` (#12600)* opencl: add `im2col`* opencl: add `gelu_quick`* opencl: add mrope* opencl: add vision rope

Assets26

1 person reacted

b4977

27 Mar 11:54

github-actions

b4977

f125b8d

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

b4977

llama : add PLM GGUF Conversion & Inference Support (#12457)* add edgellm model arch[conversation feature doesn't work]* remove output.weight layer for edgellm arch* [Model] update the name of the model* update the name of model arch in convert gguf* [Model] Refarctor the model arch into llama-model* [Bug] Fix the bug in create attn kv* [Code] Fix editorconfig erros* [Code] Remove Trailing whitespace* [Code] Remove Trailing whitespace* [Code] Change the order of model arch in list* [Code] Fix flake8 Lint errors* Remove trailing white space* [Code] Remove  call in model arch

Assets25

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b4988

b4987

b4986

b4985

b4984

b4982

b4981

b4980

b4978

b4977

Movatterモバイル変換

Releases: ggml-org/llama.cpp

b4988

b4987

b4986

b4985

b4984

b4982

b4981

b4980

b4978

b4977