- Notifications
You must be signed in to change notification settings - Fork1k
Open
Description
Description
I know it is already possible to useOllama servers with pydantic using its OpenAI interface, however, there are a few reasons for why I would be interested in a directllama-cpp binding:
- Tool-use and huggingface compatibility: When using Ollama on huggface models that support tool use (e.g.GLM-4-32B), Ollama says the model "does not support tool use", while llama-cpp works correctly
- Standalone file: llama-cpp does not need to boot the server separately, which matters for ease of use and portability as everything can be included in a single python file
- Better configurability/nativity? Unsure about this one, but it seems like llama-cpp-python might be faster and have overall more optimization builtin as compared to the ollama server
This is why I would like a llama-cpp Model to pydantic-ai.
Would there be interest for a PR regarding this? I already have started working on it for a personal project, so it is only a matter of packaging/adding tests etc...
References
No response
Metadata
Metadata
Assignees
Labels
No labels