Embeddings use non-causal attention which requires all tokens in a single ubatch (n_batch == n_ubatch). Default values differ (n_batch=2048, n_ubatch=512), so users frequently encounter this issue.

Solution

Add parameter validation inmain():

Detect when--embedding enabled andn_batch != n_ubatch
Log warnings explaining the requirement
Automatically set both tomin(n_batch, n_ubatch)

Uses auto-correction approach (suggested by@mirekphd) for better UX than strict rejection.

Testing

✅ Builds successfully
✅ Validation triggers:-b 2048 -ub 512 --embedding → logs warnings, sets both=512
✅ No false positives:-b 512 -ub 512 --embedding → silent
✅ Tested on macOS M3 Pro with embedding model

server: validate n_batch == n_ubatch for embeddings (ggml-org#6263)

3827a23

Fixesggml-org#6263 where server accepts mismatched batch/ubatch values withembeddings, leading to suboptimal or incorrect behavior.Problem: Embeddings and reranking use non-causal attention which requiresall tokens to be processed within a single ubatch. When n_batch != n_ubatch,the configuration is incoherent. Default values differ (n_batch=2048,n_ubatch=512), so users encounter this frequently.Solution:- Add parameter validation in main() after common_params_parse()- When embeddings enabled and n_batch != n_ubatch:  * Log warnings explaining the requirement  * Automatically set both to min(n_batch, n_ubatch)  * Ensure coherent configurationThis follows the auto-correction approach suggested by@mirekphdand provides better UX than strict rejection.Testing:✅ Builds successfully✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512✅ No false positives: -b 512 -ub 512 --embedding → no warnings✅ Verified on macOS M3 Pro with embedding model