- Notifications
You must be signed in to change notification settings - Fork12.3k
Hugging Face Inference Endpoints now supports GGUF out of the box!#9669
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
You can now deploy any GGUF model on your own endpoint, in just a few clicks! Simply select GGUF, select hardware configuration and done! An endpoint powered byllama-server (built from Try it now -->https://ui.endpoints.huggingface.co/ And the best part is:
A huge thanks to@ggerganov@slaren and@huggingface team for making this possible! llama.hfe.ok.mp4 |
BetaWas this translation helpful?Give feedback.
All reactions
👍 3🎉 1❤️ 16🚀 1
Replies: 1 comment
-
Hermes 405B model can be deployed on 2xA100. The generation speed is around 8t/s, which is not bad! ![]() |
BetaWas this translation helpful?Give feedback.