Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Project status#3471

Closed
ggerganov started this conversation inGeneral
Oct 4, 2023· 4 comments· 10 replies
Discussion options

[NO LONGER UPDATED]

Below is a summary of the functionality provided by thellama.cpp project.

  • The goal is to have a birds-eye-view of what works and what does not
  • Collaborators are encouraged to add things to the list and update the status of existing things as needed
  • The list should be simple without too much details about the specific problems - these belong to dedicated issues

Legend (feel free to update):

✅ - Working correctly
☁️ - Partially working
❌ - Failing
❓ - Status unknown (needs testing)
🔬 - Under investigation
🚧 - Currently in development

FeatureExecutableStatusIssues
Inference
Single-batch decodingmain,simple
Parallel / batched decodingbatched
Continuous batchingparallel
Speculative samplingspeculative
Tree-based speculative samplingspeculative
Self-speculative samplingspeculative🚧#3565
Lookahead samplinglookahead
Infillinfill
REST APIserver
Embeddingsembedding
Grouped Query Attention CPUmain
Grouped Query Attention CUDAmain
Grouped Query Attention OpenCLmain
Grouped Query Attention Metalmain
Session load / savemain
K-quants (256) CUDAmain
K-quants (64) CUDAmain
K-quants (256) Metalmain
K-quants (64) Metalmain☁️#3276
Special tokensmain
Grammar samplingmain,server
Beam searchbeam-search#3471 (comment)
LoRAmain☁️#3333#3519
SPM tokenizertest-tokenizer-0-llama
BPE tokenizertest-tokenizer-0-falcon
Models
LLaMA v1main
LLaMA v2main
Falconmain
StarCodermain
Baichuanmain
MPTmain
Persimmonmain
LLaVAllava
Refactmain
Bloommain
StableLM-3b-4e1tmain
Training
Finetuning CPUfinetune
Finetuning Metalfinetune🔬
Backends
CPU x64ggml
CPU Armggml
GPU CUDAggml-cuda
GPU ROCmggml-cuda
GPU Metalggml-metal
GPU OpenCLggml-opencl
GPU Vulkanggml-vulkan🚧#2059
You must be logged in to vote

Replies: 4 comments 10 replies

Comment options

What does the "☁️" mean?

You must be logged in to vote
2 replies
@shibe2
Comment options

I don't know what the icon means, but current status of OpenCL back-end is: it works with supported models, but is buggy and perhaps, slower than it could be.

@ggerganov
Comment options

ggerganovOct 4, 2023
Maintainer Author

Yup, this was my impression from reading a few issues lately. If you think it's not the case, feel free to update it. I just haven't set up OpenCL in my environment and cannot do tests atm

Comment options

So "Parallel decoding" is done bybatched and "Continuous batching" is done byparallel? Are these reversed?

You must be logged in to vote
1 reply
@ggerganov
Comment options

ggerganovOct 5, 2023
Maintainer Author

Parallel decoding is also called "batched decoding" hencebatched. Theparallel example demonstrates a basic server that serves clients in parallel - it just happens to have the continuous batching feature as an option.

Naming things is hard :) Sorry if these are confusing

Comment options

Should beam search be added here? I think it is broken atm, at least with CUDA.

You must be logged in to vote
4 replies
@ggerganov
Comment options

ggerganovOct 8, 2023
Maintainer Author

Yes, it should be added. The list is far from complete

@Mihaiii
Comment options

Fwiw, for me beam search is broken even without CUDA in a sense that when I run the example, nothing happens (it just hangs for minutesat this line until I CTRL+C it).

If it's an unknown problem, I'll open an issue (tbh, it's strange that nobody mentioned it before so maybe I'm doing something wrong).

Update: when it hangs on the above mentioned line, I have 0 hard page fauls/sec.

@slaren
Comment options

With CUDA it works for a while, but then it starts generating gibberish. I think that the calls tollama_decode are failing and it is not catching it. It's probably missing some KV cache management after the batched decoding change.

@ggerganov
Comment options

ggerganovOct 18, 2023
Maintainer Author

The beam search functionality should be moved out from the library and implemented as a standalone example.

Comment options

What would be criteria for considering OpenCL back-end working correctly? I've fixed all known bugs in ggml-opencl.cpp and now working on refactoring like#3669.

You must be logged in to vote
3 replies
@ggerganov
Comment options

ggerganovOct 18, 2023
Maintainer Author

The criteria is that if it runs correctly on your machine, then it is ✅ until someone reports a problem that is reproducible - then it becomes ☁️ or ❌ depending on how broken the thing is

@shibe2
Comment options

Alright, turning the green light then!

@Yossef-Dawoad
Comment options

maybe you can ditch the icons for something Like Scoring Like (A+, A, A-, B, ...) this will make it obvious if something working fine but needs improvements has a score with A- and so on, maybe something like this :
[ A+ ] or [ A ] : working like charm
[ A- ] : Working correctly but needs improvement
[ B ] : Partially working
[ B- ] : Partially working with big issues to be resolved
[ C ] : Status unknown (needs testing)
[ D+ ] : Under investigation
[ D ] : Currently in development
[ F ] : Failing

maybe you should add a column for tier support for example, if a feature is tier 1 or 2, ... what do you think?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
General
Labels
None yet
7 participants
@ggerganov@slaren@ScarletEmerald@Mihaiii@niansa@shibe2@Yossef-Dawoad

[8]ページ先頭

©2009-2025 Movatter.jp