- Notifications
You must be signed in to change notification settings - Fork11.4k
Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5124
bc091a4
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
Assets26
- 303 MB
2025-04-12T16:17:13Z - 373 MB
2025-04-12T16:17:22Z - 24.2 MB
2025-04-12T16:17:33Z - 25.8 MB
2025-04-12T16:17:35Z - 25.9 MB
2025-04-12T16:17:36Z - 34.8 MB
2025-04-12T16:17:37Z - 27.4 MB
2025-04-12T16:17:38Z - 19.7 MB
2025-04-12T16:17:40Z - 19.7 MB
2025-04-12T16:17:41Z - 19.7 MB
2025-04-12T16:17:41Z 2025-04-12T15:33:39Z 2025-04-12T15:33:39Z - Loading
b5123
a483757
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
vulkan: use aligned loads for flash attention mask (#12853)Rewrite the stride logic for the mask tensor in the FA shader to force thestride to be aligned, to allow using more efficient loads.
Assets26
b5122
e59ea53
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
llava: Fix cpu-only clip image encoding sefault (#12907)* llava: Fix cpu-only clip image encoding* clip : no smart ptr for ggml_backend_t* Fix for backend_ptr push_back---------Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Assets26
b5121
c94085d
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
server : add VSCode's Github Copilot Chat support (#12896)* server : add VSCode's Github Copilot Chat support* cont : update handler name
Assets26
1 person reacted
b5120
e8a6263
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
rpc : Set cache directory in rpc-server.cpp on FreeBSD (#12903)
Assets26
b5119
b6930eb
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
`tool-call`: fix non-tool-calling grammar crashes w/ Qwen / Hermes 2 …
Assets26
b5118
68b08f3
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
common : Define cache directory on FreeBSD (#12892)
Assets26
b5117
578754b
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
sycl: Support sycl_ext_oneapi_limited_graph (#12873)The current usage of the SYCL-Graph extension checks forthe `sycl_ext_oneapi_graph` device aspect. However, it is alsopossible to support `sycl_ext_oneapi_limied_graph` devices thatdon't support update
Assets26
b5116
b2034c2
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
contrib: support modelscope community (#12664)* support download from modelscope* support login* remove comments* add arguments* fix code* fix win32* test passed* fix readme* revert readme* change to MODEL_ENDPOINT* revert tail line* fix readme* refactor model endpoint* remove blank line* fix header* fix as comments* update comment* update readme---------Co-authored-by: tastelikefeet <yuze.zyz@alibaba-inc/com>
Assets25
b5115
06bb53a
This commit was created on GitHub.com and signed with GitHub’sverified signature.
Compare
Could not load tags
Nothing to show
{{ refName }}defaultLoading
llama-model : add Glm4Model implementation for GLM-4-0414 (#12867)* GLM-4-0414* use original one* Using with tensor map* fix bug* change order* change order* format with flask8